Destore2023 / MetaPathways-Bookkeeping

For Filecoin Allocator
0 stars 0 forks source link

[DataCap Application] <Stonedata> - <1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, and 4.2> #50

Open Floridajing opened 4 months ago

Floridajing commented 4 months ago

Version

1

DataCap Applicant

Floridajing

Project ID

1

Data Owner Name

Illumina, Inc.

Data Owner Country/Region

China

Data Owner Industry

Life Science / Healthcare

Website

www.illumina.com.cn

Social Media Handle

Florida

Social Media Type

Slack

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

12PiB

Expected size of single dataset (one copy)

1223TiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1024TiB

On-chain address for first allocation

f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

At Illumina, our goal is to apply innovative technologies to the analysis of genetic variation and function, making studies possible that were not even imaginable just a few years ago. It is mission critical for us to deliver innovative, flexible, and scalable solutions to meet the needs of our customers. As a global company that places high value on collaborative interactions, rapid delivery of solutions, and providing the highest level of quality, we strive to meet this challenge. Illumina innovative sequencing and array technologies are fueling groundbreaking advancements in life science research, translational and consumer genomics, and molecular diagnostics.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

Describe the data being stored onto Filecoin

This dataset contains alignment files and short nucleotide, copy number (CNV), repeat expansion (STR), structural variant (SV) and other variant call files from the 1000 Genomes Project Phase 3 dataset (n=3202) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, and v4.2.7 software.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

If you are a data preparer. What is your location (Country/Region)

China

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

We use boost to prepare data. We downloaded the data to our storage servers for this project. For those sps with good internet bandwidth, we will provide the download link. For sps with poor network bandwidth, we will use disk transfers. We are still using the options of an optimized lotus packaging solution and are considering whether to go for a transition to boost.

If you are not preparing the data, who will prepare the data? (Provide name and business)

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

Yes. But we found that this dataset which has been stored in the filecoin network can no longer be retrieved. We want to store this dataset into the filecoin network again.

Please share a sample of the data

s3://1000genomes-dragen/
s3://1000genomes-dragen-3.7.6/
s3://1000genomes-dragen-v3.7.6/
s3://1000genomes-dragen-v4.0.3/
s3://1000genomes-dragen-v4-2-7/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

What is the expected retrieval frequency for this data

Monthly

For how long do you plan to keep this dataset stored on Filecoin

More than 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you used

Please list the provider IDs and location of the storage providers you will be working with.

f03510628, Hongkong
f01730296, Hongkong
f03300709, Sichuan
f03233333, Jiangmen
f03321072, Chengdu
f01313, Hongkong
f03322378, Shanghai (not work)
f03408862, India
f03363420, Luzhou
f02029742, Chengdu
f03499693, Hongkong
f03499694, Hongkong
f03529375, Canada
f03260592, Ziyang
f03498626, Dongguan
f03526300, Sichuan

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

Can you confirm that you will follow the Fil+ guideline

Yes

datacap-bot[bot] commented 4 months ago

Application is waiting for allocator review

Floridajing commented 4 months ago

@Destore2023 Dear allocator, please come to our application and give us feedback. Thank you.

Destore2023 commented 4 months ago

@Floridajing Welcome to fill in this form to complete your information as the first step to join in. https://www.wenjuan.com/s/qAVFfuN/

Destore2023 commented 4 months ago

@Floridajing We have checked that this dataset was stored in filecoin. Why do you still store this dataset?

Image

Floridajing commented 4 months ago

@Destore2023 Yes we know that this dataset has been stored here. But we've checked their sps' retrieval and all data can not be retrieved. We want to store this dataset and make them be retrieved. Is this allowable?

Destore2023 commented 4 months ago

@Floridajing I just checked their retrieval and as you siad. http://grafana.filstation.app:3000/d/fea90509-20e8-4d49-b4ad-f0436da9c75d/spark-public-dashboard?from=2025-02-20T11:14:23.661Z&to=2025-02-20T11:19:23.661Z&timezone=browser&orgId=1&viewPanel=panel-10 But I need to check it with governance team to decide if this is allowed. Please wait for the result.

Floridajing commented 4 months ago

Image @Destore2023 Hi I have finished your form. Can we have datacap to store the data?

Destore2023 commented 4 months ago

@Floridajing ok. Let us check your form first.

Destore2023 commented 4 months ago

Image Got your form and we have finished checking the form you provided.

Welcome to use datacap to help you with your storage! Please take the storage process seriously and use datacap in a careful way. We will check cid report from time to time.

Since the purpose of this storage is to keep the dataset retrievable, are you sure you know how to select sp to ensure good retrieval? @Floridajing

Floridajing commented 4 months ago

@Destore2023 Yes, we have experience on preparing data and have partners who can help us for contacting with sps. We can support this dataset being retrieved.

Destore2023 commented 4 months ago

@Floridajing In order to initially check your retrieval, we decided to allocate you 1.5 PiB (half of a round of allocation) first. If you're doing well, we'll consider bringing back your regular allocation.

Floridajing commented 4 months ago

@Destore2023 Thank you. We accept it and will give a good report in storage.

datacap-bot[bot] commented 4 months ago

Datacap Request Trigger

Total DataCap requested

12PiB

Expected weekly DataCap usage rate

1024TiB

DataCap Amount - First Tranche

1.5 PiB

Client address

f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y

datacap-bot[bot] commented 4 months ago

DataCap Allocation requested

Multisig Notary address

Client address

f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y

DataCap allocation requested

1.5 PiB

Id

ba594eba-675d-4128-ba6c-7d170e7fbc61

datacap-bot[bot] commented 4 months ago

Application is ready to sign

datacap-bot[bot] commented 4 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec3wunfgspyf44kg6q5kmpbdml2yjphzqelg2an2uv5mukcxjxb5m

Address

f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y

Datacap Allocated

1.5 PiB

Signer Address

f1rppfznglfb7uyn3k6sfzeh47yq54ptvia5ixwsq

Id

ba594eba-675d-4128-ba6c-7d170e7fbc61

You can check the status here https://filfox.info/en/message/bafy2bzacec3wunfgspyf44kg6q5kmpbdml2yjphzqelg2an2uv5mukcxjxb5m

datacap-bot[bot] commented 4 months ago

Application is Granted

datacap-bot[bot] commented 4 months ago

Issue has been modified. Changes below:

(NEW vs OLD)

Please list the provider IDs and location of the storage providers you will be working with: f03099981, Beijing f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai f03408862, India vs f03099981, Beijing f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai State: ChangesRequested vs Granted

Floridajing commented 4 months ago

@Destore2023 We add one sp in our application.

Destore2023 commented 4 months ago

@Floridajing ok. If you need to update your sp list, please update it in your application. Deactivated SPs should be clearly marked (for example, using a strikethrough)

datacap-bot[bot] commented 4 months ago

Issue has been modified. Changes below:

(NEW vs OLD)

Please list the provider IDs and location of the storage providers you will be working with: f03099981, Beijing f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai f03408862, India f03363420, Luzhou vs f03099981, Beijing f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai f03408862, India

Floridajing commented 4 months ago

@Destore2023 We add one sp in our application.

Floridajing commented 3 months ago

@Destore2023 Hello, we want more datacap here. Sps have used all datacap we had.

Destore2023 commented 3 months ago

checker:manualTrigger

datacap-bot[bot] commented 3 months ago

DataCap Client Report Summary [^1]

Client address: f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y Client ID: f03441573 Report ID: 26839 Generated at: Mon, 24 Mar 2025 03:14:49 GMT (6 hours ago) [^2]

Report checks

✔️ Storage providers are located in different regions

⚠️2 storage providers sealed more than 25% of total datacap

⚠️1 storage providers sealed too much duplicate data

✔️ Storage provider locations looks healthy

⚠️20.00% of storage providers have retrieval success rate equal to zero

⚠️60.00% of storage providers have retrieval success rate less than 75%

⚠️Low replica percentage is 70.37%

✔️ No CID sharing has been observed

⚠️80.00% of storage providers have misreported their data to IPNI

✔️ Storage providers IPNI reporting looks healthy (2/2)

Full report

Click here to view the full report [^1]: To manually trigger this report, add a comment with text checker:manualTrigger [^2]: New report will be generated only if the latest one is older than 30 hours

Destore2023 commented 3 months ago

@Floridajing One sp's retrieval is 0. Please explain it.

Floridajing commented 3 months ago

@Destore2023 This sp has communicated with us and they wanted to stop receiving data. Then we didn't send more data to them. Since they dropped out of our plan, they didn't keep the retrieval.

Destore2023 commented 3 months ago

@Floridajing In that case, you should mark it in your sp list. Your data distribution should be focused too.

Floridajing commented 3 months ago

@Destore2023 Thank you for your advice. We will do that next time. Some sps' sealing ability is better than other sps, so they will get more data faster than other sps. We will consider our distribution.

Destore2023 commented 3 months ago

ok, consider the report of this round, we still decide to allocate only half of a round of allocation to you. Our pathway has 1.38P now and we will allocate the amount to you. Please do better next time, then we'll consider bringing back your regular allocation. @Floridajing

Floridajing commented 3 months ago

@Destore2023 Thank you again, allocator.

datacap-bot[bot] commented 3 months ago

Issue information change request has been approved.

datacap-bot[bot] commented 3 months ago

Application is in Refill

datacap-bot[bot] commented 3 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedz3wd3hjkhgalfgg3e4fgnu3jihpxxuvmrdbihtkosikqk5tz7xa

Address

f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y

Datacap Allocated

1.38 PiB

Signer Address

f1rppfznglfb7uyn3k6sfzeh47yq54ptvia5ixwsq

Id

bf5fab9e-e684-4ea4-a059-a31354501cc6

You can check the status here https://filfox.info/en/message/bafy2bzacedz3wd3hjkhgalfgg3e4fgnu3jihpxxuvmrdbihtkosikqk5tz7xa

datacap-bot[bot] commented 3 months ago

Application is Granted

datacap-bot[bot] commented 3 months ago

Client used 75% of the allocated DataCap. Consider allocating next tranche.

datacap-bot[bot] commented 3 months ago

Issue has been modified. Changes below:

(NEW vs OLD)

Please list the provider IDs and location of the storage providers you will be working with: f03510628, Hongkong f01730296, Hongkong f03300709, Sichuan f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai f03408862, India f03363420, Luzhou vs f03099981, Beijing f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai f03408862, India f03363420, Luzhou State: ChangesRequested vs Granted

datacap-bot[bot] commented 3 months ago

Issue has been modified. Changes below:

(NEW vs OLD)

Please list the provider IDs and location of the storage providers you will be working with: f03510628, Hongkong f01730296, Hongkong f03300709, Sichuan f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai (not work) f03408862, India f03363420, Luzhou vs f03510628, Hongkong f01730296, Hongkong f03300709, Sichuan f03233333, Jiangmen f03321072, Chengdu f01313, Hongkong f03322378, Shanghai f03408862, India f03363420, Luzhou

Floridajing commented 3 months ago

@Destore2023 We have updated the application. Please give us the datacap.

Destore2023 commented 3 months ago

checker:manualTrigger

datacap-bot[bot] commented 3 months ago

DataCap Client Report Summary [^1]

Client address: f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y Client ID: f03441573 Report ID: 32737 Generated at: Wed, 09 Apr 2025 03:17:12 GMT (5 hours ago) [^2]

Report checks

✔️ Storage providers are located in different regions

⚠️1 storage providers sealed more than 25% of total datacap

✔️ Storage provider duplication looks healthy

⚠️1 storage providers have unknown IP location

⚠️22.22% of storage providers have retrieval success rate equal to zero

⚠️77.78% of storage providers have retrieval success rate less than 75%

⚠️Low replica percentage is 33.94%

✔️ No CID sharing has been observed

⚠️77.78% of storage providers have misreported their data to IPNI

⚠️11.11% of storage providers have not reported their data to IPNI

✔️ Client receiving datacap from one allocator

Full report

Click here to view the full report [^1]: To manually trigger this report, add a comment with text checker:manualTrigger [^2]: New report will be generated only if the latest one is older than 30 hours

Destore2023 commented 3 months ago

In addition to the last sp, there is another sp with a retrieval of 0 this time. Can you give some explanation? @Floridajing

@Destore2023 This sp has communicated with us and they wanted to stop receiving data. Then we didn't send more data to them. Since they dropped out of our plan, they didn't keep the retrieval.

Floridajing commented 3 months ago

@Destore2023 This is because the sp f03510628 is a new node and they have just gotten data from us. It needs some time to have retrieval data on spark dashboard.

Destore2023 commented 3 months ago

Will the retrieval rate be increased in the next step? In our rules, retrieval rate should over 75%.

Floridajing commented 3 months ago

@Destore2023 Yes, sps are cooperating with the improvement. We will check their work and give you the feedback.

Destore2023 commented 3 months ago

@Floridajing Consider that spark has had bugs before, we will continue to observe your storage plan's progress. We still decide to allocate only half of a round of allocation to you. Please do better next time, then we'll consider bringing back your regular allocation.

Floridajing commented 3 months ago

@Destore2023 Thank you and we won't let you down.

datacap-bot[bot] commented 3 months ago

Issue information change request has been approved.

datacap-bot[bot] commented 3 months ago

Application is in Refill

datacap-bot[bot] commented 3 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecldy5l426nkj2mf2cvtsyjgut7fhqfpf67tntfotxspwvvdpk7tg

Address

f1mri5agk3blkfofobuuoitmww7vpqq7kvkn4jk7y

Datacap Allocated

1688849860263936B

Signer Address

f1rppfznglfb7uyn3k6sfzeh47yq54ptvia5ixwsq

Id

0e105fd2-8b12-40ea-92df-2225ccadb849

You can check the status here https://filfox.info/en/message/bafy2bzacecldy5l426nkj2mf2cvtsyjgut7fhqfpf67tntfotxspwvvdpk7tg