filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

Terrafusion data Sampler / (Speedium) #339

Closed cryptowhizzard closed 1 year ago

cryptowhizzard commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Speedium / Dcent has engaged in Slingshot starting 2.6. We have successfully stored more than 15 differerent datasets with 20+ different miners.

What is the primary source of funding for this project?

Company account

What other projects/ecosystem stakeholders is this project associated with?

Slingshot competition hosted by Protocol Labs

Use-case details

Describe the data being stored onto Filecoin

The Terra Basic Fusion dataset is a fused dataset of the original Level 1 radiances from the five Terra instruments. They have been fully validate to contain the original Terra instrument Level 1 data. 

Where was the data in this dataset sourced from?

AWS

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://terrafusion.web.illinois.edu/.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes

What is the expected retrieval frequency for this data?

Multiple times p/y

For how long do you plan to keep this dataset stored on Filecoin?

18 months or longer

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

EU / US / Australia

How will you be distributing your data to storage providers? Is there an offline data transfer process?

The data will be transferred both offline and online.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We have a few providers who have been working with us during Slingshot Restore program and we'd like to continue working with them for ongoing Slingshot competition.

How will you be distributing deals across storage providers?

Max 2 copy's per storage provider if stored on different miners / locations.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have the resources.
filplus-checker-app[bot] commented 8 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01901765: 31.23%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.