Open amughal opened 9 months ago
Thanks for your request!
Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
Thanks for your request! Everything looks good. :ok_hand:
A Governance Team member will review the information provided and contact you back pretty soon.
This user’s identity has been verified through filplus.storage
⚠️ f01697248 has sealed 40.34% of total datacap.
⚠️ f02846602 has unknown IP location.
If you already have a list of storage providers to work with, fill out their names and provider IDs below Bitsultans, f02853198 Simple IPFS Inc., f01904546, f01697248
This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.
-- Commented by Stale Bot.
Hello @Sunnyiscoming, Please see below list of SPs, distributed across three continents.
SP Miner IDs | Contact name | SP Business Email | SP Organization Name | Region | Using VPN? | Slack handle |
---|---|---|---|---|---|---|
f02846602 | Azher | contact@mongostorage.tech | Mongo2Stor | USA | No | mongo |
f01697248 | Henry Moon | contact@simpleipfs.com | Simple IPFS Inc. | South Korea | No | hyunmoon |
f01904546 | Henry Moon | contact@simpleipfs.com | Simple IPFS Inc. | South Korea | No | hyunmoon |
f02853198 | Bitsultans | Diego Siwer | Bitsultans | Argentina | No | Diego Siwer |
Regarding your questions:
This issue is correct, and you should see more balanced results in next 2 weeks. Have been actively sealing on miner f01904546 ⚠️ f01697248 has sealed 40.34% of total datacap.
Below issue has been fixed. ⚠️ f02846602 has unknown IP location. Check the correct IP: https://filfox.info/en/peer/12D3KooWPdhRZBjt6PoM9cjLgpxjUi4uaQXPKPE62zBNBe8CSydX
Please let me know if you have any further questions. Thank you
The community rules that each sp cannot store more than 30%. Why did you stored 40.34% of the previous application on the same sp? Please describe the datacap storage allocation plan of this application in detail.
Hello @Sunnyiscoming , Sorry that I have not replied on this thread for sometime, it was more intentional. Background, and as I just ran the checker BOT, I have been actively sealing for better data distribution across two SPs. Previously, it was 40.34%, and as of now it is 35.33%. BOT is not as accurate, in the next few days, you would see it decreased to around 33%. I hope you would appreciate that distributing 10PiB across the SPs globally is a challenging task, and I am not that far in achieving this.
Similarly, for the current application, my aim is to provide distribution in US west coast, South Korea and South America. SPs have been identified, three of them have the collateral ready, while one is working on it. Like for the LDN 2040, I will be making sure that per FIL+, deals get distributed across 5 SPs.
Let me know if there are additional questions.
f03016877
f1qwyhtmlfogwajktfabqvhqfxapiqozuxpwirmpa
500TiB
2f77fd67-58cf-4fe1-95cc-94f1058aec4d
Data Owner Name
Mongo2Stor
What is your role related to the dataset
Data Preparer
Data Owner Country/Region
United States
Data Owner Industry
Not-for-Profit
Website
https://data.commoncrawl.org
Social Media
https://twitter.com/commoncrawl (commoncrawl)
Total amount of DataCap being requested
5 PiB
Expected size of single dataset (one copy)
1 PiB
Number of replicas to store
5
Weekly allocation of DataCap requested
500 TiB
On-chain address for first allocation
f1qwyhtmlfogwajktfabqvhqfxapiqozuxpwirmpa
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
n/a
Share a brief history of your project and organization
Mongo2Stor (MongoStorage) is working as Storage Service Provider, DataPrep and consulting services in the Filecoin echo system. Based in Southern California, USA, Mongo2Stor is a FIL Green GOLD Certified and currently working through to be fully ESPA certified provider. The founders have vast experience in networks and systems, and have gone through multiple sessions, presentation at ESPA and featured in the Zero to One Service Provider Twitter session by Protocol Labs. This LDN request is followup to #2040, which has been a great success. Data had been stored to prominent Service Providers like Seal Storage, Simple IPFS Inc. (#2 ranking), Aligned SaaS provider, PikNik (Medula) and many others. CommonCrawl has new monthly archives since the launch of LDN #2040, and since then a year worth of data needs to be archived and make it available on the Filecoin network.
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
n/a
Describe the data being stored onto Filecoin
https://data.commoncrawl.org/crawl-data/index.html CC-MAIN-2023-50 CC-MAIN-2023-40 CC-MAIN-2023-23 CC-MAIN-2023-14 CC-MAIN-2022-49 CC-MAIN-2022-40 CC-MAIN-2022-33 CC-MAIN-2022-27
Where was the data currently stored in this dataset sourced from
Other
If you answered "Other" in the previous question, enter the details here
Commoncrawl provided hosted services
If you are a data preparer, what is your location (City and Country)
Chino, USA
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
Singularity is an excellent tool for CAR generation. I have used it extensively for the other LDN application.
If you are not preparing the data, who will prepare the data? (Provide name and business)
n/a
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.
n/a
Please share a sample of the data
Data Type File List #Files Total Size Compressed (TiB) Segments segment.paths.gz 100
WARC warc.paths.gz 90000 99.25 WAT wat.paths.gz 90000 22.99 WET wet.paths.gz 90000 9.30 Robots.txt robotstxt.paths.gz 90000 0.18 Non-200 responses non200responses.paths.gz 90000 3.43 URL index cc-index.paths.gz 302 0.25 Columnar URL index cc-index-table.paths.gz 900 0.28
Confirm that this is a public dataset that can be retrieved by anyone on the Network
Yes
If you chose not to confirm, what was the reason
n/a
What is the expected retrieval frequency for this data
Sporadic
For how long do you plan to keep this dataset stored on Filecoin
More than 3 years
In which geographies do you plan on making storage deals
North America, South America, Europe, Australia (continent), Africa, Asia other than Greater China
How will you be distributing your data to storage providers
HTTP or FTP server
How do you plan to choose SP
Big Data Exchange
If you answered "Others" in the previous question, what is the tool or platform you plan to use
n/a
If you already have a list of storage providers to work with, fill out their names and provider IDs below
Bitsultans, f02853198 Simple IPFS Inc., f01904546, f01697248
How do you plan to make deals to your storage providers
Lotus client
If you answered "Others/custom tool" in the previous question, enter the details here
n/a
Can you confirm that you will follow the Fil+ guideline
Yes
Application created via filplus.storage