filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Kernelogic - Sentinel3 (6/6) #2167

Closed kernelogic closed 4 months ago

kernelogic commented 1 year ago

Data Owner Name

Meteorological Environmental Earth Observation

What is your role related to the dataset

Dataset Owner

Data Owner Country/Region

Italy

Data Owner Industry

Environment

Website

https://github.com/Sentinel-5P/data-on-s3/blob/master/DocsForAws/Sentinel3Description.md

Social Media

https://twitter.com/meeosrl

Total amount of DataCap being requested

6PiB

Expected size of single dataset (one copy)

105167 * 32GB sector = 3286T

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

This is an extension of previously fully used up LDNs:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1508
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1507
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1106

The total number of CARs prepared from the 4 AWS buckets is 105167.

In order to store 10 copies, 1051670 * 32GB / 1024 / 1024 = 32PiB DC is needed. 
Currently I have stored 849261 deals, which is 849261 * 32GB / 1024 / 1024 = 25.9 PiB.

I need 6PiB more DC to fully onboard the whole dataset as planned.

The summary of previous LDN CID report is here:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1508#issuecomment-1689187027

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

Storage working groups, BigD exchange, singularity deal making tool.

Describe the data being stored onto Filecoin

https://github.com/Sentinel-5P/data-on-s3/blob/master/DocsForAws/Sentinel3Description.md

Sentinel-3 is a Copernicus satellite whose main three sensors are:

Ocean and Land Colour Instrument (OLCI) for medium resolution marine and terrestrial optical measurements.
Sea and Land Surface Temperature Radiometer (SLSTR) for thermal measurements(both marine and terrestrial), land monitoring and fire detection.
SAR Radar Altimeter (SRAL) together with the MicroWave Radiometer (MWR) and Precise Orbit Determination (POD) for ocean topography measurements.

Total size: about 1.5PB, this dataset consists several s3 buckets:
arn:aws:s3:::meeo-s3-cog/   eu-central-1    631.2 TiB
arn:aws:s3:::meeo-s3/NTC/   eu-central-1    409.6 TiB
arn:aws:s3:::meeo-s3/NRT/   eu-central-1    315.6 TiB
arn:aws:s3:::meeo-s3/STC/   eu-central-1    27.5 TiB

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

Canada

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

Singularity V1 (I am 1 of the 2 main developers)

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

Yes, by me. I just need a bit more DC to fully onboard as planned.

Please share a sample of the data

https://registry.opendata.aws/sentinel-3/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

See previous LDNs for SPs already used. The following SPs are candidates for this LDN extension only.

Original:
Chris f02131881,f02131801,f02131855 HongKong (committed)
Sageone f02315355,f02191897,f02028779 HongKong (committed)
PIKNIK f01851060,f01652333 LasVegas (potential)
TopBlocks f0240185 Santa Clara (potential)

Newly added on Nov 2nd 2023: 
HK_FIL f02830451,f02829744 Hong Kong (Slack contact DevOps)

How do you plan to make deals to your storage providers

Boost client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

kernelogic commented 1 year ago

Please see the singularity output for total CARs prepared for this dataset (105167) and total deals onboarded (849261) to support my reason asking for extension in the issue.

image

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

6PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

DataCap allocation requested

307.19TiB

Id

ecbc2fe1-1f8e-4ca3-8576-da70364b6989

Sunnyiscoming commented 1 year ago

Per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

kernelogic commented 1 year ago

@Sunnyiscoming I am not sure how to complete this FIL+ registration form for this "4 or more Storage Provider entities to work with in 3 different regions".

This is an extension of 3 other existing LDNs, the overall prior distribution far exceeds the minimum requirements.

I think maybe this LDN can exempt from FIL+ registration, or allowing only 2 regions?

Sunnyiscoming commented 1 year ago

You should complete the form so that we can record it. I think you can list 4 or more storage providers in at least 2 regions here at first.

kernelogic commented 1 year ago

@Sunnyiscoming Yes ma'am! I have submitted the form.

laurarenpanda commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacectew54qge74hvnbyjq73ugvdfxojul24yoclrhimrbtgf2cfa55q

Address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Datacap Allocated

307.19TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

ecbc2fe1-1f8e-4ca3-8576-da70364b6989

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacectew54qge74hvnbyjq73ugvdfxojul24yoclrhimrbtgf2cfa55q

NDLABS-Leo commented 1 year ago

willingness to support

NDLABS-Leo commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecoxb4qfkqkluaxwlfmcguscp7jmqcivpj7k4pvikktceww5vvwyu

Address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Datacap Allocated

307.19TiB

Signer Address

f1yayfsv6whu3rheviucvventj3y6t542xfpb47ei

Id

ecbc2fe1-1f8e-4ca3-8576-da70364b6989

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecoxb4qfkqkluaxwlfmcguscp7jmqcivpj7k4pvikktceww5vvwyu

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

DataCap allocation requested

512TiB

Id

1e3cd818-07e6-43e9-ad8e-e4ae108d0b4b

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

307.18TiB

Datacap to be granted to reach the total amount requested by the client (6PiB)

5.70PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
4470 3 307.18TiB 46.71 142.37TiB
kernelogic commented 1 year ago

checker:manualTrigger f1ivqeb3laht7eqlehxrtelfcedje7h6g57dwtliq f1itrbkc7i4u2p46pb6ugv3bmz6zmdjvelpmf6vqi f1ua4eolclcnc2pzp4tsqodey3rwbjz46slb6e3nq f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

a1991car commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

a1991car commented 1 year ago

feiyan contacted me. This is the second round and the data has not been fully updated. As a well-known community member, I support this for the time being ,I will check the subsequent data.

a1991car commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb4ocq42kusay4ayedtxdkdejgazgnx6vkkno5fwepsvgbsqfxpjq

Address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Datacap Allocated

512.00TiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

1e3cd818-07e6-43e9-ad8e-e4ae108d0b4b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb4ocq42kusay4ayedtxdkdejgazgnx6vkkno5fwepsvgbsqfxpjq

nj-steve commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

nj-steve commented 1 year ago

Hello @kernelogic I see the retrieval rate is too slow, and the SPs are in the same region. Please tell us why , and how to solve it in the next round.

kernelogic commented 1 year ago

Hello @nj-steve I am contacting the SPs in this round for the retrieval issues, expect it to be improved in the next tranche. I think the tornado in Hong Kong in the past week made the retrieval attempts failed.

Regarding the other warnings like region, distribution, CID sharing etc, please note this is a final extension of a series of LDNs, so only a few SPs are used on this one. Please consider the CID report in conjunction of the previous addresses:

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1508#issuecomment-1689187027

nj-steve commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebb723pffyzollwo7qiyma4rtc3m76hfx4g5voj7jque42ryleevo

Address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Datacap Allocated

512.00TiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

1e3cd818-07e6-43e9-ad8e-e4ae108d0b4b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebb723pffyzollwo7qiyma4rtc3m76hfx4g5voj7jque42ryleevo

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

I am working through this.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

DataCap allocation requested

1PiB

Id

7490099d-f00a-4e7d-8562-b621cbe92caf

kernelogic commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

kernelogic commented 1 year ago

Again please note this is a final extension of a series of LDNs, so only a few SPs are used on this one. Please consider the CID report in conjunction of the previous addresses:

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1508#issuecomment-1689187027

newwebgroup commented 1 year ago

As a long-term reputable Fil+ Client, The retrieval rate is very high, willing to provide support.

Retrieval Statistics Overall Graphsync retrieval success rate: 0.00% Overall HTTP retrieval success rate: 82.46%

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1508#issuecomment-1689187027

newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebp3vaxgkwkdauyby2f7b6p5h6cmdtvthubhx5bysju7igdtncnd6

Address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Datacap Allocated

1.00PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

7490099d-f00a-4e7d-8562-b621cbe92caf

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebp3vaxgkwkdauyby2f7b6p5h6cmdtvthubhx5bysju7igdtncnd6

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

Waiting for top up on multisig.

Normalnoise commented 1 year ago

checker:manualTrigger f1ivqeb3laht7eqlehxrtelfcedje7h6g57dwtliq f1itrbkc7i4u2p46pb6ugv3bmz6zmdjvelpmf6vqi f1ua4eolclcnc2pzp4tsqodey3rwbjz46slb6e3nq f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

bq1024 commented 1 year ago

The report looks good, willing to support

bq1024 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecjenrnt3ivkge7s3kiyr3ta4bkpq3lcdq76i47cxvjlvtwpkfs4u

Address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

Datacap Allocated

1.00PiB

Signer Address

f1pkjtavqx4r2q2w3he3jknfc5mo2vgfimccmpnaa

Id

7490099d-f00a-4e7d-8562-b621cbe92caf

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecjenrnt3ivkge7s3kiyr3ta4bkpq3lcdq76i47cxvjlvtwpkfs4u

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

Keep open

kevzak commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

kernelogic commented 1 year ago

Same as #1683 please look at combined CID report among #1507 #1508 #1106 to determine replication factors.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

DataCap allocation requested

2PiB

Id

13cd90e9-7121-4e4f-9035-1daec03e1d4d

kernelogic commented 1 year ago

checker:manualTrigger f1ivqeb3laht7eqlehxrtelfcedje7h6g57dwtliq f1itrbkc7i4u2p46pb6ugv3bmz6zmdjvelpmf6vqi f1ua4eolclcnc2pzp4tsqodey3rwbjz46slb6e3nq f1424hqqhyxv5syecwfrf3fyncxkkraly2msyyrvq

kernelogic commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f02315355: 34.63%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

kernelogic commented 1 year ago

Combined CID report didn't work, please see this previous one for reference https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2167#issuecomment-1752002621

liyunzhi-666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f02315355: 34.63%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

liyunzhi-666 commented 1 year ago

@kernelogic has contacted me and explained the CID report to me. I can support this round.