filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Cabrina-HRRR Open Dataset <2/7> #1141

Closed NiwanDao closed 1 year ago

NiwanDao commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.


- I am an active participant in Slingshot and Slingshot Restore. This experience has gained me a lot of knowledge as a data preparer, deal SP, and retrieval client. 
- I have established a relationship with other community members along the way and have successfully sent deals with over 60 SPs worldwide.
- With the surge of requests from other SPs on deal-making and the value of storing humanity’s most important data permanently, I decided to bring HRRR datasets to the network.
- I will track deals and provide retrieval access through https://dstorage.cabrina.xyz/. 

What is the primary source of funding for this project?

Mostly self-funded and might be from BigD exchange.

What other projects/ecosystem stakeholders is this project associated with?

No

Use-case details

Describe the data being stored onto Filecoin


> The High-Resolution Rapid Refresh (HRRR) is sourced by Global System Laboratory from National Oceanic && Atmospheric Administration. It is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-h period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh. 
> HRRR dataset stored on AWS is the archive since 2014 with a total size of 2PiB and 38526457 Object.
> I plan to send 10x copies, each for 2PiB raw data. Since there is a conversion rate between the raw data size and Datacap consumption size, each copy would require around 3.5PiB Datacap, with a total amount of  35PiB Datacap. 

Where was the data in this dataset sourced from?

AWS: https://registry.opendata.aws/noaa-hrrr-pds/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

Link a video on how HRRR is critical to forecasting weather. 
https://www.youtube.com/watch?v=tIPHkPeW7CA

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Confirmed

What is the expected retrieval frequency for this data?

Not often 

For how long do you plan to keep this dataset stored on Filecoin?

> 360 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Welcome storage providers from all countries 

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I am open to both options: offline and online. 
For storage providers in China, offline delivery might be a better choice from a speed perspective. 
For others, distributing data through HTTP might be more realistic. 

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

I will consider SPs I have worked with before and am also willing to partner with other SPs that demonstrate the ability to run node safely,  handle real deals with a consistent sealing rate, and supports retrieval. 

How will you be distributing deals across storage providers?

Each SP can store no more than 1 copy of data, which means no single SP weighs more than 10% of all 35PiB I proposed. 

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

yes, I am ready to go  
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

750TiB

Client address

f1mmtovvurlhcvfmqbww6nzwwrse3cljccjmdftki

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1mmtovvurlhcvfmqbww6nzwwrse3cljccjmdftki

DataCap allocation requested

256TiB

Id

2d27c118-8965-49c7-a3e9-96d6b3608483

NiwanDao commented 1 year ago

@simonkim0515 @Kevin-FF-USA Anomaly detected in the audit trail for this LDN, and this LDN does not pop up under the Large Request tab. Please take a look. Same applies to 1142

截屏2023-01-12 下午2 31 50 截屏2023-01-12 下午2 32 09
NiwanDao commented 1 year ago

Closing this one and opening a new one since the bug exists for over a month. link to new application https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1592