filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Cabrina- nanopore-reference-human-genome #1564

Closed NiwanDao closed 1 year ago

NiwanDao commented 1 year ago

Data Owner Name

Nanopore Whole Genome Sequencing Consortium

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://github.com/nanopore-wgs-consortium/NA12878

Social Media

https://dstorage.cabrina.xyz/

Total amount of DataCap being requested

3PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Custom multisig

Identifier

No response

Share a brief history of your project and organization

- I am an active participant in Slingshot and Slingshot Restore. This experience has gained me a lot of knowledge as a data preparer, deal SP, and retrieval client. 
- I have established a relationship with other community members along the way and have successfully sent deals with over 60 SPs worldwide. 
- With the surge of requests from other SPs on deal-making and the value of storing humanity’s most important data permanently, I decided to bring more value data to the network.
- I will track deals and provide retrieval access through https://dstorage.cabrina.xyz/.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

- This dataset includes the sequencing and assembly of a reference standard human genome (GM12878) using the MinION nanopore sequencing instrument with the R9.4 1D chemistry.
- I finished the CAR generation with a total number of 7753.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://nanopore-human-wgs.s3.amazonaws.com/index.html

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

I work with SP including but not limited to FL cloud, GreaterHeat, HarryM-Filet,Chenxi,Andriy.I will continue look for new SP from the offline community and online BDE platform. 
The partner I previously worked with are listed in https://dstorage.cabrina.xyz/sp/

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Carohere commented 1 year ago

@xingjitansuo, can you share the data size and your allocation plan? Also, 500TiB weekly allocation is way above the community-suggested value , any particular reason you are applying so?

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

3PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

DataCap allocation requested

153.59TiB

Id

a04daae9-242d-476c-8902-b14527efad8a

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NiwanDao commented 1 year ago

As the application indicated, I have generated 6174 files in total, which is around 250T per copy and I plan to send 10 - 12 replicas. Most of the SPs I tend to work with have a daily capacity of at least 100T. 500T weekly allocation is a conservative and fair prediction. @Carohere

cryptowhizzard commented 1 year ago

Hello @xingjitansuo

Given your track record i will sign this application. Thank you for bringing your contribution to filecoin.

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceavqojpxw54a44mhb7iyhh2xle27tq72plwk2atiuwi6otqduerus

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

153.59TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

a04daae9-242d-476c-8902-b14527efad8a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceavqojpxw54a44mhb7iyhh2xle27tq72plwk2atiuwi6otqduerus

Carohere commented 1 year ago

@xingjitansuo ACK. How many SPs are confirmed now? To help me better follow up on this application, could you share their nodes, geographical locations, etc.?

NiwanDao commented 1 year ago

Currently, 4 different organizations operating in the U.S. and China are confirmed for now as indicated in the application. SP ID has not been finalized yet from the SP side. You may get more ideas soon.

Carohere commented 1 year ago

Noted. Good luck finding 6+ SPs. Looking forward to seeing the checker report!

mjroddy commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea4jd2ye5e3wqkjk2svyha362jpkukg6mwlaun5fju4hmxr2bavfk

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

153.59TiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Id

a04daae9-242d-476c-8902-b14527efad8a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea4jd2ye5e3wqkjk2svyha362jpkukg6mwlaun5fju4hmxr2bavfk

dongpo313 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ f01937454 has sealed 100.00% of total datacap.

⚠️ All storage providers are located in the same region.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01937454 Chengdu, Sichuan, CN
CHINANET-BACKBONE
20.56 TiB 100.00% 20.56 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
20.56 TiB 20.56 TiB 1 100.00%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 90% of total datacap - f01937454: 100.00%

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

DataCap allocation requested

307.19TiB

Id

85cc4369-dd2a-4bc9-a4c2-3286807d5347

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f01937454: 87.08%

Deal Data Replication

⚠️ 98.34% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NiwanDao commented 1 year ago

The single copy is at around 250T. Since this is the first allocation and some SP starts earlier than others, we will likely see more distributed allocation in the next round.

kernelogic commented 1 year ago

Based on the explanation and track record of @xingjitansuo ,willing to support.

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebsz2vrupfmmlhtl4usrnuxef7ffxlmogudqd7dvoob5h5cqmaqdo

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

307.19TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

85cc4369-dd2a-4bc9-a4c2-3286807d5347

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebsz2vrupfmmlhtl4usrnuxef7ffxlmogudqd7dvoob5h5cqmaqdo

Joss-Hua commented 1 year ago

image

Joss-Hua commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea55ulq4d2taqunt37s7a7rgr7bgcyewrjjczi2xavg4ghab2mnww

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

307.19TiB

Signer Address

f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi

Id

85cc4369-dd2a-4bc9-a4c2-3286807d5347

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea55ulq4d2taqunt37s7a7rgr7bgcyewrjjczi2xavg4ghab2mnww

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01858410

Client address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

DataCap allocation requested

614.39TiB

Id

7ae4f71f-7975-4fed-8aa7-eb216f031d45

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Last two approvers

Joss-Hua & kernelogic

Rule to calculate the allocation request amount

20% of total dc amount requested

DataCap allocation requested

614.39TiB

Total DataCap granted for client so far

460.77TiB

Datacap to be granted to reach the total amount requested by the client (3PiB)

2.55PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
9960 7 307.19TiB 39.84 71.40TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 90.74% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 90.74% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 90.74% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NiwanDao commented 1 year ago

As the distribution process is still in its early stages, it is hard that the majority of deals are being stored over 4 times. However, it is expected that data replication for deals will improve in the next tranche.

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedta54cf55wpp2praa7ncqic7tinyp5iayrcsrspzi2qtuv7rbk3m

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

614.39TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

7ae4f71f-7975-4fed-8aa7-eb216f031d45

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedta54cf55wpp2praa7ncqic7tinyp5iayrcsrspzi2qtuv7rbk3m

stcloudlisa commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecvupguilrmbjm24apxcnyfyx4lvvgl4e7wgl6h7swxfmfm7rkuxy

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

614.39TiB

Signer Address

f1jvvltduw35u6inn5tr4nfualyd42bh3vjtylgci

Id

7ae4f71f-7975-4fed-8aa7-eb216f031d45

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecvupguilrmbjm24apxcnyfyx4lvvgl4e7wgl6h7swxfmfm7rkuxy

stcloudlisa commented 1 year ago

WechatIMG175

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f01858410

Client address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

DataCap allocation requested

1.20PiB

Id

720e2e06-0934-41da-8a9e-95930a062031

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f01925248: 45.22%

Deal Data Replication

⚠️ 87.83% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f01925248: 45.22%

Deal Data Replication

⚠️ 87.83% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 30% of total datacap - f01925248: 45.22%

Deal Data Replication

⚠️ 87.83% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

newwebgroup commented 1 year ago
image
newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb4po7ry2docugb737sle6hjncbpzacxectuee2pukghxkx6npc3m

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

1.20PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

720e2e06-0934-41da-8a9e-95930a062031

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb4po7ry2docugb737sle6hjncbpzacxectuee2pukghxkx6npc3m

xiaoyuaiheshui commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecagjsbntl22h4owmbup5rqrx26qemiabrucoesep6squvwzj2r4c

Address

f1ndp7rsl4nvzgtxti4uuoyhvzms6bhulionxekpi

Datacap Allocated

1.20PiB

Signer Address

f122qmy25wdtt5mxd77kndiq7z5x2n3iwiuz2wdsa

Id

720e2e06-0934-41da-8a9e-95930a062031

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecagjsbntl22h4owmbup5rqrx26qemiabrucoesep6squvwzj2r4c

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.