filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] NDLABS - Life Sciences Dataset <2/4> #1522

Closed NDLABS-Leo closed 1 year ago

NDLABS-Leo commented 1 year ago

Data Owner Name

NDLABS

Data Owner Country/Region

Singapore

Data Owner Industry

Life Science / Healthcare

Website

https://www.ndlabs.io/#/

Social Media

Twitter: @imNDLABS
Slack: @NDLABS-OFFICE

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Custom multisig

Identifier

No response

Share a brief history of your project and organization

ND LABS has technical operation centers and nodes in Singapore, Hong Kong, the United States, and Dubai. Since Fil has launch of the mainnet in 2020, ND has begun to provide technical services to partners to help them complete the construction of storage services. At present, the accumulated storage power of ND exceeds 300P globally. The largest node has 100P storage power, and the node owns exceeds more than 1.4 million FIL. 
ND LABS is positioned as a decentralized storage service provider for WEB3. For a long time, ND not only focuses on building nodes for partners, but also explores how to provide better storage services for potential clients of web3. Since October 2021, ND has been deeply involved in the FilPlus project, vigorously promoting the Filplus project to partners who has effective data storage needs. We also providing them with a complete set of solutions and technical services for storing data in the FIL network. The Singapore and US nodes are the main storage nodes, which was provide real data storage for early customers.

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

We gonna cooperate with more SPs from other regions

Describe the data being stored onto Filecoin

For the first phase, we gonna store open datasets of Life Sciences from AWS with a total size of 2 PiB. 
Life Sciences Open Datasets has a total of 102 items, including but not limited to the following items. 
-Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
-Gabriella Miller Kids First Pediatric Research Program (Kids First)
-Genome Aggregation Database (gnomAD)
-Allen Cell Imaging Collections
-International Neuroimaging Data-Sharing Initiative (INDI)
-Cell Organelle Segmentation in Electron Microscopy (COSEM) on AWS
-Distributed Archives for Neurophysiology Data Integration (DANDI)

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://ocg.cancer.gov/programs/target/
https://kidsfirstdrc.org/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Monthly

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Filmine, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

We are one of the members from Singapore SPWG and we also join in other SPWG. SPs from the SPWG are trustable and experienced. 
Also, we are engaging in encouraging more small SPs to join the Filecoin network. What’s more, We will provide those small SPs who newly join the network with technology support.

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

herrehesse commented 1 year ago

@NDLABS-OFFICE Awesome application! Can you try and answer my questionnaire posted in the 1/4 request #1521 ?

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

DataCap allocation requested

250TiB

Id

e0cf6ee6-3a56-4dca-8858-ee7432d862b5

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

ipfscn commented 1 year ago

As a listed company, NDLabs has strong strength, and I am willing to express my support for the time being

kernelogic commented 1 year ago

Will support this open dataset and verify by perform retrievals on future allocations.

kernelogic commented 1 year ago
image

Can't sign atm.

Tom-OriginStorage commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceadxhfmdqpxmsott5pxsh7ta4cnhi74b5ooj7borrjmdskukugz7c

Address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Datacap Allocated

250.00TiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

Id

e0cf6ee6-3a56-4dca-8858-ee7432d862b5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceadxhfmdqpxmsott5pxsh7ta4cnhi74b5ooj7borrjmdskukugz7c

mikezli commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecbkgmpk5272jx6fvrxsvwxixryamhzdhoib6g7hbe2w7dqdst7xu

Address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Datacap Allocated

250.00TiB

Signer Address

f1dnb3uz7sylxk6emti3ififcvu3nlufnnsjui6ea

Id

e0cf6ee6-3a56-4dca-8858-ee7432d862b5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecbkgmpk5272jx6fvrxsvwxixryamhzdhoib6g7hbe2w7dqdst7xu

mikezli commented 1 year ago

It looks good. The public large data should be supported.

herrehesse commented 1 year ago

@NDLABS-OFFICE Not agreeing here with some of the datasets that you are storing (some have been stored 20-40 times already).

But I am looking forward to your distribution across multiple continents and will do retrievals to see if everything works as required by the FIL+ rules. Please show us how its done! Good luck.

Waiting for the first CID report.

NDLABS-Leo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

YuanHeHK commented 1 year ago

@NDLABS-OFFICE Can you provide some data CIDS, let me do some search verification?

NDLABS-Leo commented 1 year ago

@YuanHeHK I selected some CIDs, if you need more, you can tell me.

f01740934: piece_cid:baga6ea4seaql3nwjb4kexvk53rup45b3c76yrbuxgrlltvlvdnrhewcmmps46ci payload_cid:bafykbzacedcx3eitxl2ydwvpsdabzcdwhp35gwxgiy7e76psynayv2xscdf5q

f01834253: piece_cid:baga6ea4seaqn26rt5zcpe7533do2va2ia46nhkjr37gkwxhwg5p656xr7nn7udi payload_cid:bafykbzacedqyizguncvlqny3a7fipvytjfry2yv3taj3ur322xalux4sastoq

f01853104: piece_cid:baga6ea4seaqfq2qqn3gocmuunvuhr7ubylg44kprntknb7fkab2j3nvscv4nkca payload_cid:bafykbzaceacykgjk3esbmitkwiw67glesn53zqbrlbmqmvili2kcp3z75wons

f01854080: piece_cid:baga6ea4seaqk4wrzjlbg6p3gibcaorxnzjbvfem32vhwlta6ethlt5c5pn3xuky payload_cid:bafykbzacecdykwdsz3xfszxnv2cfuoej236bsa277x66ntgggpoclwjldykaw

f01890456: piece_cid:baga6ea4seaqd4hkfd3p6n6iwfkglpxxoxxdidfszz3rywopzizbfupcysngn4iq payload_cid:bafykbzacedf33i2e5ntvv66ikvsdjgbycexrwogq2yc4j7iqtlynp34tgf6v2

f01985611: piece_cid:baga6ea4seaqikwl765shan4y4k7z3aaliilpodrcnuo3mlpwh3purppm7a46kia payload_cid:bafykbzaceb2iz3bnmk4m34kpravkemje2caewlgiigz4evqb6i2xuko4vqowy

NDLABS-Leo commented 1 year ago

image

NDLABS-Leo commented 1 year ago

@raghavrmadya The check bot didn't work, I contacted simon and fabri and they didn't get a reply yet.

NDLABS-Leo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

kernelogic commented 1 year ago

Same LDN with #1521 , I performed DD there, good retrieval. see https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1521#issuecomment-1418502515

YuanHeHK commented 1 year ago

Thank you for the data. This round had 10 SPs receiving orders with real data, and I randomly spot-checked the connection status of 5 test SPs and it looked very healthy.

YuanHeHK commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

YuanHeHK commented 1 year ago

It looks like there is some delay in checking bot data updates, support this app first, and continue to observe later

YuanHeHK commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacechonxmbxt4lbqucwxmhxfwvfo2ctrgfk47r5dqiy5n7t4xq7cshe

Address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Datacap Allocated

250.00TiB

Signer Address

f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa

Id

e0cf6ee6-3a56-4dca-8858-ee7432d862b5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacechonxmbxt4lbqucwxmhxfwvfo2ctrgfk47r5dqiy5n7t4xq7cshe

xinaxu commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NDLABS-Leo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NDLABS-Leo commented 1 year ago

checker:manualTrigger

xinaxu commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 2nd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f02000936 Hong Kong, Central and Western, HK
7Road International HK Limited
23.78 TiB 5.46% 23.78 TiB 0.00%
f02006374 Hong Kong, Central and Western, HK
7Road International HK Limited
20.44 TiB 4.69% 20.44 TiB 0.00%
f01854080 Los Angeles, California, US
Zenlayer Inc
79.34 TiB 18.21% 79.34 TiB 0.00%
f01853104 Los Angeles, California, US
Zenlayer Inc
61.31 TiB 14.07% 61.31 TiB 0.00%
f01834253 Los Angeles, California, US
Zenlayer Inc
56.19 TiB 12.89% 56.19 TiB 0.00%
f01740934 Los Angeles, California, US
Zenlayer Inc
51.56 TiB 11.83% 51.56 TiB 0.00%
f01890456 Los Angeles, California, US
Zenlayer Inc
50.97 TiB 11.70% 50.97 TiB 0.00%
f01985611 Los Angeles, California, US
Zenlayer Inc
49.66 TiB 11.39% 49.66 TiB 0.00%
f01853077 Singapore, Singapore, SG
Zenlayer Inc
21.69 TiB 4.98% 21.69 TiB 0.00%
f01852363 Singapore, Singapore, SG
Zenlayer Inc
20.84 TiB 4.78% 20.84 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 2nd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
101.53 TiB 101.53 TiB 1 23.30%
37.78 TiB 75.56 TiB 2 17.34%
53.06 TiB 159.19 TiB 3 36.53%
24.88 TiB 99.50 TiB 4 22.83%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f17nerghj2kmg7b4e6asft3xexga5qbzbqe3hi4gy Unknown 736.00 GiB 2 Unknown

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NDLABS-Leo commented 1 year ago

Hi Community FYI, ND open data project <Life Sciences Dataset 1-4> is one proect where the data is uniformly distributed by our maintance colleagues. It is reasonable if there is any CID sharing between thses 4 LDNs. f17nerghj2kmg7b4e6asft3xexga5qbzbqe3hi4gy is the address of Life Sciences Dataset 3

Joss-Hua commented 1 year ago

At present, all to be normal. I'm support this round and keep watching.

Joss-Hua commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceblvfs6uiawbhc6p6mkgkzcgzfoluwmbvwghk3a6avmfxwphftcc4

Address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Datacap Allocated

250.00TiB

Signer Address

f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi

Id

e0cf6ee6-3a56-4dca-8858-ee7432d862b5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceblvfs6uiawbhc6p6mkgkzcgzfoluwmbvwghk3a6avmfxwphftcc4

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

DataCap allocation requested

500TiB

Id

cd07a1c4-ea87-4a91-86ec-50fd121c5c11

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Last two approvers

Joss-Hua & fireflyHZ

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

500TiB

Total DataCap granted for client so far

500TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.51PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
19653 12 250TiB 16.56 55.12TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f02000936 Hong Kong, Central and Western, HK
7Road International HK Limited
25.88 TiB 4.52% 25.88 TiB 0.00%
f02006374 Hong Kong, Central and Western, HK
7Road International HK Limited
20.44 TiB 3.57% 20.44 TiB 0.00%
f02012175 Seoul, Seoul, KR
Korea Telecom
60.06 TiB 10.48% 60.06 TiB 0.00%
f01854080 Los Angeles, California, US
Zenlayer Inc
93.06 TiB 16.24% 93.06 TiB 0.00%
f01834253 Los Angeles, California, US
Zenlayer Inc
66.50 TiB 11.61% 66.50 TiB 0.00%
f01985611 Los Angeles, California, US
Zenlayer Inc
62.84 TiB 10.97% 62.84 TiB 0.00%
f01853104 Los Angeles, California, US
Zenlayer Inc
61.31 TiB 10.70% 61.31 TiB 0.00%
f01890456 Los Angeles, California, US
Zenlayer Inc
55.06 TiB 9.61% 55.06 TiB 0.00%
f01740934 Los Angeles, California, US
Zenlayer Inc
52.72 TiB 9.20% 52.72 TiB 0.00%
f01853077 Singapore, Singapore, SG
Zenlayer Inc
31.41 TiB 5.48% 31.41 TiB 0.00%
f01852363 Singapore, Singapore, SG
Zenlayer Inc
29.28 TiB 5.11% 29.28 TiB 0.00%
f02031006 Los Angeles, California, US
Zenlayer Inc
14.44 TiB 2.52% 14.44 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
192.97 TiB 192.97 TiB 1 33.68%
48.47 TiB 96.94 TiB 2 16.92%
58.16 TiB 174.47 TiB 3 30.45%
27.16 TiB 108.63 TiB 4 18.96%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.[^3]

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f17nerghj2kmg7b4e6asft3xexga5qbzbqe3hi4gy Unknown 736.00 GiB 2 Unknown
f1ojiggunuyo6vnqzdq4ameg2zxs3pvvd3z5bxqwy Unknown 192.00 GiB 5 Unknown
f1i2igpiu5nlmcjbo2tyrcd5w7kdgtlgif2bb6vka Wel Vape 96.00 GiB 1 1newwebgroup
1stcouldlisa

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

kernelogic commented 1 year ago

DD performed in https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1521#issuecomment-1418502515

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedpawmtvzjh7efdjmkf7iw35wa7e5wrhinw2csia6appeb5pwh2dy

Address

f1bliocymkeatgincf5galzxpfur7lvsmavevahay

Datacap Allocated

500.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

cd07a1c4-ea87-4a91-86ec-50fd121c5c11

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedpawmtvzjh7efdjmkf7iw35wa7e5wrhinw2csia6appeb5pwh2dy

Tom-OriginStorage commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

herrehesse commented 1 year ago
Screenshot 2023-02-14 at 09 13 44

Not supportive of this application until clearly stated and proven the business names, regions and cities of each SP. Most of them (90%) are operating under VPN, so extra due diligence is needed.

@NDLABS-OFFICE can you help me here with full transparency?

NDLABS-Leo commented 1 year ago

@herrehesse https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1521 Please see the record here.

Zhangcffff commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

1ane-1 commented 1 year ago

lotus client retrieval-ask f01890456 bafykbzaceaaylyvu7edhede47exgdcejrb3ka5sl7lort3shz45vwcvczgyta lotus client retrieval-ask f01890456 bafykbzaced2w4phmzozspxz3627whg6xp4cch56r62za6ewok7ccwzsgxhlmg

1ane-1 commented 1 year ago

f99834b02b7f8155aa65e3707f60dd5