filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - Open datasets onboarding initiative phase 1 (3/4) #1639

Closed kernelogic closed 6 months ago

kernelogic commented 1 year ago

Data Owner Name

Kernelogic

Data Owner Country/Region

Canada

Data Owner Industry

Life Science / Healthcare

Website

https://singularity-browser.kernelogic.ca

Social Media

N/A

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Custom multisig

Identifier

No response

Share a brief history of your project and organization

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I have a web UI (https://singularity-browser.kernelogic.ca/) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

Some of the recent LDNs I completed:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1108
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1107
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1106
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1104
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/983

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

Storage working groups, BigD exchange, singularity deal making tool.

Describe the data being stored onto Filecoin

Because each LDN requires a separate client address in order for the bot to work properly, in order to onboard more data more smoothly, I am kicking off a series of various open dataset onboarding LDNs to onboard new AWS open datasets that I have not done before. Including but not limited to:

Allen Mouse Brain Atlas
Community Earth System Model Large Ensemble (CESM LENS)
Community Earth System Model v2 Large Ensemble (CESM2 LENS)
Epoch of Reionization Dataset
HIRLAM Weather Model
NIH NCBI Sequence Read Archive (SRA) on AWS
NOAA Global Ensemble Forecast System (GEFS)
NOAA Fundamental Climate Data Records (FCDR)
NOAA Joint Polar Satellite System (JPSS)

All these datasets will be indexed for easy lookup through my website https://singularity-browser.kernelogic.ca

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/allen-mouse-brain-atlas/
https://registry.opendata.aws/ncar-cesm-lens/
https://registry.opendata.aws/epoch-of-reionization/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

No response

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

herrehesse commented 1 year ago

Dear Filecoin+ Github applicant,

We have noticed that some of you are submitting merged datacap requests for datasets that are already (partly) on the chain. While we appreciate your enthusiasm to contribute to the Filecoin network, we want to remind you that this behaviour may not be beneficial to the network in the long run. In fact, this behaviour has been questioned and discussed in issue #832 on the Filecoin notary-governance Github repository.

We encourage you to review the discussions in issue #832. It's important to ensure that your datacap requests are valid, necessary, and add value to the network. By doing so, you can help to maintain the integrity and sustainability of the Filecoin network.

You can find the link to issue #832 here: filecoin-project/notary-governance#832

Thank you for your understanding and cooperation.

kernelogic commented 1 year ago

In my defence I provide a better browser for data indexing per dataset than fil-plus bots. It is capable to show what's being stored in each dataset in detail.

With that being said, I am also willing to follow the decision on your proposal https://github.com/filecoin-project/notary-governance/issues/832 should it get accepted.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

See questions in https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1638.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

DataCap allocation requested

256TiB

Id

0d2b1965-6d2b-4863-a704-2a177b618396

Sunnyiscoming commented 1 year ago

Related proposal https://github.com/filecoin-project/notary-governance/issues/832 Hope more notaries review this application and comment on this proposal.

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec3zapdzq3yxktjfs6vtzonaf4refw24zifmhppikbigmylj4qt64

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

256.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

0d2b1965-6d2b-4863-a704-2a177b618396

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec3zapdzq3yxktjfs6vtzonaf4refw24zifmhppikbigmylj4qt64

laurarenpanda commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedwpujpun2dvl2cdys4luzua5bxm6xk3nototbuedrad5go3do6vc

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

256.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

0d2b1965-6d2b-4863-a704-2a177b618396

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedwpujpun2dvl2cdys4luzua5bxm6xk3nototbuedrad5go3do6vc

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

DataCap allocation requested

512TiB

Id

fe032820-41fb-4e35-aa9b-760569e0dd5b

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Rule to calculate the allocation request amount

10% of total dc amount requested

DataCap allocation requested

512TiB

Total DataCap granted for client so far

256TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
4800 3 256TiB 33.33 64.28TiB
kernelogic commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

a1991car commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

a1991car commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebms3a27tgpptnmdtoi4ssp45gxvmf7w5qde2kt3lq6qsyc74jpbk

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

512.00TiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

not found

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebms3a27tgpptnmdtoi4ssp45gxvmf7w5qde2kt3lq6qsyc74jpbk

newwebgroup commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced6kmwtlwl3ng5xtmlmvkld3nd73vxnjtvvtrxzogj3wpjylgfhdu

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

512.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

not found

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced6kmwtlwl3ng5xtmlmvkld3nd73vxnjtvvtrxzogj3wpjylgfhdu

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

DataCap allocation requested

1PiB

Id

14029771-ca0c-49b7-9733-56fc1261be1f

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Rule to calculate the allocation request amount

20% of total dc amount requested

DataCap allocation requested

1PiB

Total DataCap granted for client so far

465661.3YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-5.62B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
20299 4 512TiB 39.41 127.15TiB
xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceafwq6zvxnptxcj3ra3gpqm7blkle7hnttisjn7i3bro6xdqlso7q

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

1.00PiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

14029771-ca0c-49b7-9733-56fc1261be1f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceafwq6zvxnptxcj3ra3gpqm7blkle7hnttisjn7i3bro6xdqlso7q

nj-steve commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

nj-steve commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceaxtgnhyghc6njv66pp4dfljdrv5cbpmvinhb27jdjbhqx3xidiv6

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

1.00PiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

14029771-ca0c-49b7-9733-56fc1261be1f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaxtgnhyghc6njv66pp4dfljdrv5cbpmvinhb27jdjbhqx3xidiv6

kernelogic commented 1 year ago

Please note the CID Sharing is caused by 4 LDNs on the same series. It's completely normal.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

DataCap allocation requested

2PiB

Id

7e0130ae-5cb8-4359-b986-340fcd4759b4

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

931322574615478927360.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

931322574615478927360.0YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
48163 5 1PiB 37.61 258.03TiB
kernelogic commented 1 year ago

checker:manualTrigger f1qvbe2vppq7jqo3umkl3rnx4uggkxtxi6f7f2zgi f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka f1z6yigcbg6x7c2o4wasp5vya3jzr63jdjqnzvldi

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

newwebgroup commented 1 year ago

The distribution of SP positions is scattered and the replicas are healthy. no CID sharing

However, the retrieval success rate of some nodes is relatively low, and it is hoped to notify the SP to improve as soon as possible.

Retrieval Statistics Overall Graphsync retrieval success rate: 2.82% Overall HTTP retrieval success rate: 0.00% Overall Bitswap retrieval success rate: 0.00%

newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacea35en2h4jtyt55h3y7zqnes33erh35fz6tkfascknx5r3bhfx6si

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

2.00PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

7e0130ae-5cb8-4359-b986-340fcd4759b4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea35en2h4jtyt55h3y7zqnes33erh35fz6tkfascknx5r3bhfx6si

mikezli commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea4qfpwyr2ujydo3nhzlix2lvuk7abssfdk6mvyjtb2yfddrnlk7m

Address

f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka

Datacap Allocated

2.00PiB

Signer Address

f1dnb3uz7sylxk6emti3ififcvu3nlufnnsjui6ea

Id

7e0130ae-5cb8-4359-b986-340fcd4759b4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea4qfpwyr2ujydo3nhzlix2lvuk7abssfdk6mvyjtb2yfddrnlk7m

large-datacap-requests[bot] commented 1 year ago

Looks like the bot was not able to retrieve the transaction on the lotus node. Please contact governance team. The message cid: bafy2bzacea4qfpwyr2ujydo3nhzlix2lvuk7abssfdk6mvyjtb2yfddrnlk7m

Please, contact the governance team.
herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f02131855: 32.05%

Deal Data Replication

⚠️ 36.54% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

herrehesse commented 1 year ago

@kernelogic Your retrieval rate is alarmingly low but you still got 2PiB assigned. Can you explain? @newwebgroup @mikezli Why did you sign this application?

kernelogic commented 1 year ago

checker:manualTrigger f1qvbe2vppq7jqo3umkl3rnx4uggkxtxi6f7f2zgi f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka f1z6yigcbg6x7c2o4wasp5vya3jzr63jdjqnzvldi

@herrehesse try use this kind of checking (all 4 addresses together). This is only 1 out of 4 in the series.

kernelogic commented 1 year ago

DataCap and CID Checker Report Summary1

Other Addresses2

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 2.82%
  • Overall HTTP retrieval success rate: 0.00%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients3

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger
  2. Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
  3. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

My explanation is the report is satisfactory when examining as a whole series.

herrehesse commented 1 year ago

@kernelogic done checker:manualTrigger on all of them, will check in 10 minutes.

kernelogic commented 1 year ago

checker:manualTrigger f1qvbe2vppq7jqo3umkl3rnx4uggkxtxi6f7f2zgi f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea f1yvbub3wqjcd2bkayk72ace3fopgxog6ix36l7ka f1z6yigcbg6x7c2o4wasp5vya3jzr63jdjqnzvldi

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

kernelogic commented 1 year ago

@herrehesse see above for the aggregated result. This is how you check multiple LDNs in the same series. I mean the success rate is still not ideal at all, however it seems the retrieval bot doesn't update the result with new attempts.

herrehesse commented 1 year ago

@kernelogic Have you contacted the various SP's and asked them to update to the latest boost, enable http retrieval and unknown clients?

We should indeed wait for updates, I am a bit sad the notaries haven't started this discussion before granting 2PiB.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 11 months ago

Need to keep this open. Still onboarding slowly.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 11 months ago

Need to keep this open. Still onboarding slowly.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!