filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - Post Slingshot 2.8 continuation on NEXRAD dataset #594

Closed kernelogic closed 2 years ago

kernelogic commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Similarly to this LDN comment https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/432#issuecomment-1204902669 from @dkkapur , as Slingshot 2.8 has ended, I'd like to continue storing the whole dataset to its completion under a new LDN, following the same rules as before.

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/60
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/59
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/46
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/297
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/298
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/304

What is the primary source of funding for this project?

Self-funded, BigD exchange.

What other projects/ecosystem stakeholders is this project associated with?

enterprise-sp-wg, BigD exchange.

Use-case details

Describe the data being stored onto Filecoin

Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.

Where was the data in this dataset sourced from?

https://registry.opendata.aws/noaa-nexrad/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

The data is primarily compressed binary data. Below site demonstrate how to consume and render the data
https://nbviewer.org/gist/dopplershift/356f2e14832e9b676207

s3://noaa-nexrad-level2/2021/01/01/TSDF/TSDF20210101_235417_V08

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

AWS open dataset

What is the expected retrieval frequency for this data?

Infrequent. However all details are available at my browser https://slingshot.kernelogic.ca/nexrad.html?v=2.8

For how long do you plan to keep this dataset stored on Filecoin?

Between 365 - 520 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

All regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will upload my prepared CAR files to a web server and coordinate with providers to download and propose offline deals.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Beside the previous SPs I have worked with, I also utilize bigD exchange to further decentralize the storage

To name a few from the community that I deal with regularly: PIKNIK, Holon, CabrinaHuang, HarryM, BigBear, j1v, XinAn Xu, WillTechMusing.

From BigD exchange: Mog Li, Devin Chen, DSS Nathanial Marsh, Rabinovitch, Vin K, arockpool Tony

How will you be distributing deals across storage providers?

Evenly across all providers I propose to, if they can handle. If a miner is a notary itself, this notary will receive no more than 20% of the total granted datacap.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

kernelogic commented 2 years ago

To emphasize my advantages:

  1. A very decentralized, transparent list of SPs.
  2. A usable dataset browser containing file details and how to retrieve them.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

kernelogic commented 2 years ago

Updating this request to potentially utilize the new proposal #594

This dataset is 2PB+, I have got one 5PB so this one I am requesting 15PB so that I can have approximately 10 replicas in total.

Sunnyiscoming commented 2 years ago

There are some existing issues related with noaa-nexrad. As you said, this dataset is 2PB+, But there are more than 20 PB Datacap requested in the following large datasets. Slingshot v2 has ended, there should be no more Slingshot related LDNs. So I think maybe this issue should be closed.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/483 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/432 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/398 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/340 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/312 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/80

kernelogic commented 2 years ago

@Sunnyiscoming I'm creating this to see if I can finish what I prepared. Up to the RKH to decide.

raghavrmadya commented 2 years ago

Hi @kernelogic , we expect the outcome of #594 will take some time so it's best to open 3 different apps for the 15 PiB request

kernelogic commented 2 years ago

Sorry @raghavrmadya I just saw your reply. Closing this to open new split ones. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1004 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1005 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1006