filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] NOAA Water-Column Sonar Data Archive #2006

Closed sunLanden closed 1 year ago

sunLanden commented 1 year ago

Data Owner Name

NOAA

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

United States

Data Owner Industry

Resources, Agriculture & Fisheries

Website

https://www.ncei.noaa.gov/products/water-column-sonar-data

Social Media

https://www.facebook.com/NOAANCEI/
https://www.instagram.com/noaadata/
https://twitter.com/NOAANCEI

Total amount of DataCap being requested

5PiB

Expected size of single dataset (one copy)

199.7TiB

Number of replicas to store

10

Weekly allocation of DataCap requested

800TiB

On-chain address for first allocation

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Share a brief history of your project and organization

I'm an SP and NOAA is the Nation's leading authority for environmental data, and manage one of the largest archives of atmospheric, coastal, geophysical, and oceanic research in the world. NCEI contributes to the NESDIS mission by developing new products and services that span the science disciplines and enable better data discovery.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NCEI Water-Column Sonar Data Archive--- Water column sonar data focus on the area from near the surface of the ocean to the seafloor. Primary uses of these specific sonar data include 3-D mapping of fish schools and other mid-water marine organisms; assessing biological abundance; species identification; and habitat characterization. Other uses include mapping underwater gas seeps and remotely monitoring undersea oil spills.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

lotus

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncei-wcsd-archive/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How do you plan to choose storage providers

Slack, Filmine, Big Data Exchange

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

in the progress

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

sunLanden commented 1 year ago

@Sunnyiscoming Hello, my previous application is https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1960, please approve this application which is a substitute of #1960

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

800TiB

Client address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

DataCap allocation requested

256TiB

Id

ccdab9ac-cb92-45af-9e66-a0543e485538

MEIYAN666 commented 1 year ago

The applicant reached me on Slack and willing to support its first allocation.

MEIYAN666 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceccu2j7lj5nzpf736ofavrqimg3num6q2z63brzcs32olycxetkva

Address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Datacap Allocated

256.00TiB

Signer Address

f1bwugfihrmn3iyunzyxst5nttql3dge4khwmurtq

Id

ccdab9ac-cb92-45af-9e66-a0543e485538

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceccu2j7lj5nzpf736ofavrqimg3num6q2z63brzcs32olycxetkva

Casey-PG commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedi3l4py47g2tsm5regoo7qfzm7xxx63u6wfklzf3kbnctb2mzzo2

Address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Datacap Allocated

256.00TiB

Signer Address

f1d4yb3wags3mtddzesxoo63jv7dmlec3bq4yteni

Id

ccdab9ac-cb92-45af-9e66-a0543e485538

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedi3l4py47g2tsm5regoo7qfzm7xxx63u6wfklzf3kbnctb2mzzo2

Casey-PG commented 1 year ago

Public dataset with reasonable distrubtion plan. I am willing to support the first round and will check the following updates.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

Sunnyiscoming commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No active deals found for this client.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

DataCap allocation requested

512TiB

Id

c30034a9-186e-4cac-bd65-5f835df66b4b

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

256TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
4238 5 256TiB 31.76 59.75TiB
TakiChain commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebonqjxlric4zk4uwfwhonkgo6ubnyjfpb356uwqj4i6rv2ha24xw

Address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Datacap Allocated

512.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

c30034a9-186e-4cac-bd65-5f835df66b4b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebonqjxlric4zk4uwfwhonkgo6ubnyjfpb356uwqj4i6rv2ha24xw

Bennyyangpu commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceclkff3w6uonvy5fpklbq5z2reo2ovkdppclxheuyfary3qkj3nhw

Address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Datacap Allocated

512.00TiB

Signer Address

f174fg3bqbln3zjnkxtyf6s54txqkr7yqkj6cig7y

Id

c30034a9-186e-4cac-bd65-5f835df66b4b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceclkff3w6uonvy5fpklbq5z2reo2ovkdppclxheuyfary3qkj3nhw

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

DataCap allocation requested

1PiB

Id

be33b547-c429-46f0-9658-02a3af2cc4c2

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Rule to calculate the allocation request amount

200% weekly > 1PiB, requesting 1PiB

DataCap allocation requested

1PiB

Total DataCap granted for client so far

465661.3YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

465661.3YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
18674 10 512TiB 35.05 122.68TiB
spaceT9 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

@sunLanden

The FIL+ rules state that your data needs to be retrievable. Your data is not, thus notary's can't sign on the next applications. Are you going to fix this or will you close this application?

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

@sunLanden

None of the data is retrievable. If you don't intend to make it retrievable then please close this issue.

sunLanden commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

sunLanden commented 1 year ago

image

It is supporting retrieval.

cryptowhizzard commented 1 year ago

The HTTP retrieval statistics are unreliable at the moment. See Slack.

cryptowhizzard commented 1 year ago

Can you tell me what SP's i should use your retrieval? According to my system the used SP's have been flagged.

Http retrievals have been skewed last week and cannot be used as metric. This is a known issue and has been posted on slack. It's WIP.

Scherm­afbeelding 2023-07-31 om 12 42 57

sunLanden commented 1 year ago

@cryptowhizzard Is there any possibility that your system need to update or fix?Your data is not the latest data.

cryptowhizzard commented 1 year ago

@sunLanden

I am alway reachable for feedback. Can you please explain why you think my data is not up to date? We fetch it from protocol labs ( datacapstats.io ).

Thanks.

Wengeding commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

Wengeding commented 1 year ago

Report looks health. The inability of some nodes to retrieve is permissible. We want to focus on the general retrieval success rate that can't be below 1%.

Wengeding commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecjn4puaba4ylwcdeehiwxihr45sapc5b7lh2yr643jyt4w7y6u6c

Address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Datacap Allocated

1.00PiB

Signer Address

f1txfsjmix4vlzido4dkildrnbw26owtlbslexmpa

Id

be33b547-c429-46f0-9658-02a3af2cc4c2

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecjn4puaba4ylwcdeehiwxihr45sapc5b7lh2yr643jyt4w7y6u6c

cryptowhizzard commented 1 year ago

Signed without retrieval.

Dispute submitted.

Scherm­afbeelding 2023-07-31 om 18 10 37
AthSmith commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 73.07% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

AthSmith commented 1 year ago

Judging from previous communication and the updated CID report, willing to support.

AthSmith commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceabwx6sudsynxfupwbuv74o6mlodgz4dmiyqiegumccpxqktlms6i

Address

f1y77vzqlmtv7hc6zcxicu2jqzh766lwgzlej6tti

Datacap Allocated

1.00PiB

Signer Address

f1vxbqrf7rfum3n6m5u6eb4re6xj7amvsaqnzu64y

Id

be33b547-c429-46f0-9658-02a3af2cc4c2

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceabwx6sudsynxfupwbuv74o6mlodgz4dmiyqiegumccpxqktlms6i

cryptowhizzard commented 1 year ago

This client is actively stalling http retrievals and blocked http ranged requests with a reverse proxy to prevent it's data being investigated.

It works as follows:

One set's a bandwidth limit with NGINX on the HTTP retrieval. After a random certain amount the limit is set to zero. This makes the transfer timeout. Because range retrieval is disabled in NGINX one cannot pick up where he left and needs to start all over again.

Log can be found at http://datasetcreators.com/downloadedcarfiles/logs/2006.log

cryptowhizzard commented 1 year ago
Scherm­afbeelding 2023-08-09 om 17 03 53
raghavrmadya commented 1 year ago

Hi everyone, please note that https://github.com/filecoin-project/notary-governance/issues/895 was passed. @cryptowhizzard is not making these rules himself.

If you don't agree and did not get a chance to provide your input in 895, kindly open an issue to revert it but until then, applications under dispute must not be signed.

Wengeding commented 1 year ago

Hi @raghavrmadya As far as i know, bandwidth operators from Asia, especially China, often do bandwidth limitations on unusual access from abroad to prevent suspected DDOS attacks that would be gonna clog up the shared bandwidth seriously. This security policy becomes easier to activate if the node chooses to buy a smaller bandwidth.

However, if a small amount of data is downloaded for a short period of time for testing, these reports will appear normal.

One possible improved solution could be to suggest SP nodes to buy larger bandwidth services or buy double or triple backup bandwidth, but that might be more costly.

Not sure if this info would be helpful in adjudicating similar situations. Thank you!

sunLanden commented 1 year ago

@Wengeding Thanks again for your pertinent comments.

After checking with SPs, there's no problem about this application. Ongoing support retrieval. @raghavrmadya Can my application be removed from the dispute form?

cryptowhizzard commented 1 year ago

No, it cannot.

From the part of data that i managed to download i did a screening.

http://www.datasetcreators.com/downloadedcarfiles/httpretrievals/2006-f02204621-f02252118-48979839-baga6ea4seaqgexffnbq7komjthtuznl6at64ezmndar4kchc5ppbkxsvai2lkgy

doing a hexdump -C 2006-f02204621-f02252118-48979839-baga6ea4seaqgexffnbq7komjthtuznl6at64ezmndar4kchc5ppbkxsvai2lkgy

results in:

Scherm­afbeelding 2023-08-28 om 12 55 02

You are storing garbage, no more, no less.

cryptowhizzard commented 1 year ago

@raghavrmadya

Please update the dispute that apart from the malfunction on retrieval this applicant is storing garbage and not the data he promised to store in this LDN.

Datacap should be removed and this LDN should be permanently closed.

raghavrmadya commented 1 year ago

The dispute on this application has been open for a long time and the client has not responded to community concerns. The T&T WG recommends pausing all signing until client addresses the concerns directly. No clear response addressing the dispute in the next 7 days will result in closure of application and removal of DC

Dispute - https://www.notion.so/filecoin/Abuse-DC-no-retrieval-52d113d7dd3d403d944df7a99d0f064e?pvs=4

raghavrmadya commented 1 year ago

No response from client. Closing application and requesting DC removal