filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Tech Greedy - NOAA Global Ensemble Forecast System (#5) #2285

Closed xinaxu closed 9 months ago

xinaxu commented 12 months ago

Data Owner Name

National Oceanic and Atmospheric Administration (NOAA)

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

United States

Data Owner Industry

Environment

Website

https://registry.opendata.aws/noaa-gefs/

Social Media

https://registry.opendata.aws/noaa-gefs/

Total amount of DataCap being requested

15PiB

Expected size of single dataset (one copy)

3PiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

Project Detail: This dataset is an AWS open dataset that has yet been stored on Filecoin. We were planning to onboard this dataset with Slingshot V3, however, with the delay of the program, we want to start onboarding asap. Meanwhile, we want to reach out to storage providers outside of slingshot silo to expand the distribution variety further.

Organization Detail: Tech Greedy has been engaged in Filecoin ecosystem building including building data preparation and deal-making tool, participating in multiple rounds of slingshot and having mining operations.

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1483
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1682
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1955
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2087

Note this LDN is prepared using new preparation method and will not overlap with the CIDs from previous LDNs

Describe the data being stored onto Filecoin

The Global Ensemble Forecast System (GEFS), previously known as the GFS Global ENSemble (GENS), is a weather forecast model made up of 21 separate forecasts, or ensemble members. The National Centers for Environmental Prediction (NCEP) started the GEFS to address the nature of uncertainty in weather observations, which is used to initialize weather forecast models. The GEFS attempts to quantify the amount of uncertainty in a forecast by generating an ensemble of multiple forecasts, each minutely different, or perturbed, from the original observations. With global coverage, GEFS is produced four times a day with weather forecasts going out to 16 days.

The total size of the dataset is ~2.0PiB so we will likely need more application in the future.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

United States

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

This dataset is prepared using Singularity V2
https://github.com/data-preservation-programs/singularity
With inline preparation enabled. Each pack is 32GB.

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

We are THE FIRST data preparers that works on this dataset.

Please share a sample of the data

https://noaa-nbm-pds.s3.amazonaws.com/index.html

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

More than 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

f01989866 - Weimin, Xi'an, CN
f09848 - bigbear, CA, US
f01390323 - Lianxing storage, Hangzhou, CN
f081990 - Weimin Pan, HK
f01967501, f01717477, f02214491 - Acrontech, PA, US
f02228866 - Feige IT, Tokyo, JP
f01315130 - Ouruan IT, Chengdu, CN

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 12 months ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

herrehesse commented 12 months ago

@xinaxu Hello there! Lovely application, you are an extremely trustworthy member of this community. Question: Could you add multiple EU miners to your selection? This would make the distribution more even.

Goodluck!

Screenshot 2023-11-27 at 08 59 33

kevzak commented 12 months ago

SP List provided: [{"providerID": "f01989866", "location": "Xi'an, CN", "SPOrg","Weimin"}, {"providerID": "f09848", "location": "CA, US", "SPOrg","bigbear"}, {"providerID": "f01390323", "location": "Hangzhou, CN", "SPOrg","Lianxing storage"}, {"providerID": "f081990", "location": "HK", "SPOrg","Weimin Pan"}, {"providerID": "f01967501", "location": "PA, US", "SPOrg","Acrontech"}, {"providerID": "f01717477", "location": "PA, US", "SPOrg","Acrontech"}, {"providerID": "f02214491", "location": "PA, US", "SPOrg","Acrontech"}, {"providerID": "f02228866", "location": "Tokyo, JP", "SPOrg","Feige IT"}, {"providerID": "f01315130", "location": "Chengdu, CN", "SPOrg","Ouruan IT"},]

xinaxu commented 11 months ago

I should say that we will likely add more SPs to the list but the IDs of them are unknown but they will likely come from the same companies in the list above

Sunnyiscoming commented 11 months ago

Datacap Request Trigger

Total DataCap requested

15PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

large-datacap-requests[bot] commented 11 months ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

DataCap allocation requested

512TiB

Id

6f32991b-e801-49d7-97b8-e58bd922bff6

lilisy90 commented 11 months ago

I should say that we will likely add more SPs to the list but the IDs of them are unknown but they will likely come from the same companies in the list above

Oh gosh Is it can be allowed in fil+? Most of the applications are closed by @Filplus-govteam due to list of sps. @xinaxu should provide all sps he will work with, right?

{"providerID": "f01967501", "location": "PA, US", "SPOrg","Acrontech"}, {"providerID": "f01717477", "location": "PA, US", "SPOrg","Acrontech"}, {"providerID": "f02214491", "location": "PA, US", "SPOrg","Acrontech"},

@xinaxu 's sp list has the same location and the same company. image As @Filplus-govteam 's words in other applications, I think @xinaxu should be treated the same as anyone else. Close the application until updated.

@galen-mcandrew @raghavrmadya @Kevin-FF-USA @kevzak @Sunnyiscoming

Can you guys give a clear rule instead of treating everyone differently?

xinaxu commented 11 months ago

@lilisy90

Regarding my responsibility to provide all service providers (SPs) he will be working with, I concur with your observation. To add clarity, I would like to mention that should there be any changes to the list, I will promptly update it. This allows the Notary to review any new additions and make informed decisions on whether to continue support or to close the application.

In response to your observation about my SP list featuring similar locations and companies, I believe there might be a slight misunderstanding. For a more accurate perspective, I recommend referring to the original post, which details the distribution of deals. Here’s a quick summary for your convenience:

f01989866 - Weimin, Xi'an, CN f09848 - Bigbear, CA, US f01390323 - Lianxing Storage, Hangzhou, CN f081990 - Weimin Pan, HK f01967501, f01717477, f02214491 - Acrontech, PA, US f02228866 - Feige IT, Tokyo, JP f01315130 - Ouruan IT, Chengdu, CN

The presence of multiple miner IDs for Acrontech is due to their individual sealing capacities. Our approach is to ensure fair and equitable distribution of deals across various companies, not limiting them to Acrontech alone. We are committed to maintaining balance in deal distribution, allowing each company to receive a similar volume of deals. How each company manages its allocation, particularly in terms of data distribution across their service providers, is at their discretion.

Regarding the consistency in the application process, as highlighted by @Filplus-govteam in other discussions, I share your sentiment about the importance of clear guidelines. It's indeed challenging when applications are closed due to uncertainties in company identification. That’s why I strive for transparency in disclosing every company involved in our dealings, ensuring clarity and fairness in the process.

stcloudlisa commented 11 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebsupvc7vlhs4ianonup5uqj6qnbivtb5vdrqcb6ztqnankbv2ggu

Address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Datacap Allocated

512.00TiB

Signer Address

f1jvvltduw35u6inn5tr4nfualyd42bh3vjtylgci

Id

6f32991b-e801-49d7-97b8-e58bd922bff6

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebsupvc7vlhs4ianonup5uqj6qnbivtb5vdrqcb6ztqnankbv2ggu

lilisy90 commented 11 months ago

@xinaxu Thank you for your reply. What I said is all based on what's actually in your application and what other applications in Fil+ are actually facing.

To add clarity, I would like to mention that should there be any changes to the list, I will promptly update it. This allows the Notary to review any new additions and make informed decisions on whether to continue support or to close the application.

Agree with you about this point. And other applicants should also have this right to update their list instead of having their applications be closed. @Filplus-govteam Look at @xinaxu ’s comments, changes do exist in actual process, so that you should give time to applicants for checking. Closing is not a good idea.

ghost commented 11 months ago

@lilisy90 - four points: 1) This is a known client and Fil+ community member with a good level of trust established. However, I did not realize this applicant has not completed the SP registration form cc @Sunnyiscoming. Will ask and pause notary signing until completed. 2) As non-compliance continues to evolve, due diligence, especially related to SPs being used, has to adapt. If an applicant cannot specify SPs before, or if they are accepted and then do not match after some allocations it too will closed and questioned. 3) Applicants are allowed to use multiple miners per SP entity and each entity is held accountable to store only a % of the total (30%). The point of due diligence here is disallowing applicants using many miner IDs disguised as separate entities. 4) All applicants are allowed to update SP lists and provide more information to prove they meet Fil+ guidelines. When an application is closed, it does not mean they lose their DataCap allocated. It just means they will not receive anymore DataCap until they prove their SP entity and distribution.

if you have questions about a specific application, tag this handle and we will review and respond accordingly.

ghost commented 11 months ago

@xinaxu Per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

ghost commented 11 months ago

closing until SP registration form is provided. We missed this step before triggering

xinaxu commented 11 months ago

The SP registration form has been submitted. @Filplus-govteam

ghost commented 11 months ago

SP list provided: f02822921 | Tech Greedy | USA | f02822921 | Feige IT | Tokyo f02240216 | Lianxing Storage | Tokyo f09848 | SEAMOUNT TECHNOLOGIES LLC | Canada f02815438 | MaiJie | India |

kernelogic commented 11 months ago

Seeing that SP list is provided and reopened, willing to support

kernelogic commented 11 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceak44mbzrr3rogbo7w3klkphvhxlsfwvaorwztiam5ipgjamlrwjo

Address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Datacap Allocated

512.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

6f32991b-e801-49d7-97b8-e58bd922bff6

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceak44mbzrr3rogbo7w3klkphvhxlsfwvaorwztiam5ipgjamlrwjo

xinaxu commented 11 months ago

@Filplus-govteam This line should be corrected (was a copy paste mistake)

f02822921 | Feige IT | Tokyo

f02228866 | Feige IT | Tokyo

And add one more SP for Tech Greedy f02884973,f02822921 | Tech Greedy | USA

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.

xinaxu commented 11 months ago

@Filplus-govteam could you help reopen this?

large-datacap-requests[bot] commented 10 months ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

DataCap allocation requested

512TiB

Id

6b3daa97-7bf3-4564-8604-e75bd6de8436

AlanGreaterheat commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

AlanGreaterheat commented 10 months ago

Good reputable applicants who have seen the track record and are willing to support it.

AlanGreaterheat commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecubuphqh2gv2whuav7tjas2mo23vpraeulk7gga6xyfaet4tqaui

Address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Datacap Allocated

512.00TiB

Signer Address

f1pnmzlxj7cfeo2v6oj5nco46hkg2l46wj7o4xxui

Id

6b3daa97-7bf3-4564-8604-e75bd6de8436

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecubuphqh2gv2whuav7tjas2mo23vpraeulk7gga6xyfaet4tqaui

NiwanDao commented 10 months ago

LGTM

NiwanDao commented 10 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebpuha25blkqrxi3ztged6dda4zjrpctkipwyfaxwqxwuurnpdrqm

Address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Datacap Allocated

512.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

6b3daa97-7bf3-4564-8604-e75bd6de8436

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebpuha25blkqrxi3ztged6dda4zjrpctkipwyfaxwqxwuurnpdrqm

large-datacap-requests[bot] commented 10 months ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

DataCap allocation requested

512TiB

Id

02a6dbdc-23f7-4ec0-916a-4a9e9a0af7c9

xinaxu commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 88.95% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

kernelogic commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceclr3plozjbg4uwysd7why3zbdxl54mu6jcrd6vub7j44uwcisebq

Address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Datacap Allocated

512.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

02a6dbdc-23f7-4ec0-916a-4a9e9a0af7c9

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceclr3plozjbg4uwysd7why3zbdxl54mu6jcrd6vub7j44uwcisebq

a1991car commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 88.95% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

a1991car commented 10 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebpcubbw4zydff6mh4gqjwcte4qusgjwfoosls4dqijis74mj4lpg

Address

f1ec3tximpyywg7l7t7lco2ljaz7uvmsdlfg26gli

Datacap Allocated

512.00TiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

02a6dbdc-23f7-4ec0-916a-4a9e9a0af7c9

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebpcubbw4zydff6mh4gqjwcte4qusgjwfoosls4dqijis74mj4lpg

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

xinaxu commented 10 months ago

Keep active

github-actions[bot] commented 9 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 9 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.