filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] EMPIAR Public dataset(1/4) #1845

Closed nicelove666 closed 1 year ago

nicelove666 commented 1 year ago

Data Owner Name

EMPIAR Public dataset

Data Owner Country/Region

United Kingdom

Data Owner Industry

Life Science / Healthcare

Website

https://www.ebi.ac.uk/empiar

Social Media

https://www.ebi.ac.uk/empiar

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

EMPIAR, the Electron Microscopy Public Image Archive, is a public resource for raw images underpinning 3D cryo-EM maps and tomograms (themselves archived in EMDB). EMPIAR also accommodates 3D datasets obtained with volume EM techniques and soft and hard X-ray tomography. More ...
As of 2023-03-27, EMPIAR contains 1254 entries, taking up 2.73 PB of storage.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

EMPIAR, the Electron Micronscopy Public Image Archive, is a public resource for raw image underpinning 3D cryo-EM maps and tomograms. EMPIAR also accomodates 3D datasets obtained with volume EM techniques and soft and hard X-ray tomography. The purpose of EMPIAR is to provide easy access to state-of-the-art data to facilitate methods development, validation and re-use, e.g, for Machine Learning applications. EMPIAR data is also used for training and teaching purposes and as part of community challenges.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

Yes
https://www.ebi.ac.uk/empiar

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Filmine

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Boost client, Lotus client, Bidbot, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

DataCap allocation requested

250TiB

Id

c7c5c96a-8550-4341-8ef9-76259d4abe02

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

DataCap allocation requested

250TiB

Id

68ffc858-304c-4cc5-a831-bb133431ac0b

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

TimGuo7 commented 1 year ago

@nicelove666 can you explain why you need 5PiB datacap requested here, as of now, less than 3PiB of data you contained.
thanks.

nicelove666 commented 1 year ago

Dear community members, glad to see your question. https://www.ebi.ac.uk/empiar/ has 2.76PiB of data, we do 8-10 backups, take 10 backups as an example, it is very suitable to apply for 20P DC, thank you for your support Attention and support, looking forward to your help.

WX20230410-180308@2x
Tom-OriginStorage commented 1 year ago

Can you briefly explain your packaging plan?

nicelove666 commented 1 year ago

Hey, dear notary, thank you for your question. We plan to cooperate with 8-10 sps and backup 8-10 copies. They are: f01971675 f01989015 f02001485 f02078669 f02079257 f02036170 f01932183 f02009671 f01160668 f02001485 We are ready, thank you for your attention and support!

laurarenpanda commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedr6tujeakjwyuhe7vy75cszjo7u6ahz4tzm3ti2mke57eofkirlo

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

250.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

68ffc858-304c-4cc5-a831-bb133431ac0b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedr6tujeakjwyuhe7vy75cszjo7u6ahz4tzm3ti2mke57eofkirlo

SuperChaiChai commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecnq275qvinhenavwzifzdyeejaskzun3x5uuz4ovffvfyfu7elqy

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

250.00TiB

Signer Address

f12mckci3omexgzoeosjvstcfxfe4vqw7owdia3da

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecnq275qvinhenavwzifzdyeejaskzun3x5uuz4ovffvfyfu7elqy

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

DataCap allocation requested

500TiB

Id

486e8165-543a-41df-a424-00a432dfcf0d

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

500TiB

Total DataCap granted for client so far

250TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 250TiB null 30.25TiB
Normalnoise commented 1 year ago

How much data have you downloaded? Can you show us some proof of the size of the data you've downloaded?

nicelove666 commented 1 year ago

At present, we have downloaded 70T data and cooperated with 9 SPs. On average, each SP has stored about 55T of data. This is the first round of allocation. We are continuing to download more data, which is the proof of our partial data download.

WechatIMG5227 WechatIMG5226 WechatIMG5225 WechatIMG5228
Normalnoise commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebst4yobhdvcfsf72oyhv77ut4vrhgmss3br5lqoay53bbcthb5t2

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

500.00TiB

Signer Address

f1c5non5yf35avgcpsqvxu4yj54yyvxorwyjochqq

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebst4yobhdvcfsf72oyhv77ut4vrhgmss3br5lqoay53bbcthb5t2

Tom-OriginStorage commented 1 year ago

According to the inspection of the robot, the packaging is relatively healthy and willing to support

Tom-OriginStorage commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecs4h6vpincu63rv6pxdlixyevy6xswr47q73fno7a47xcxgim4da

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

500.00TiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

Id

486e8165-543a-41df-a424-00a432dfcf0d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecs4h6vpincu63rv6pxdlixyevy6xswr47q73fno7a47xcxgim4da

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

DataCap allocation requested

1000.0TiB

Id

828b0097-0b2d-4aa4-8d85-da55d22afb2f

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

1000.0TiB

Total DataCap granted for client so far

454747.4YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-5.49B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 500TiB null 124.96TiB
nicelove666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 53.47% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

mikezli commented 1 year ago
bbea3ba75616ace0e950f63c13cf8fa
mikezli commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebml32phgsskdqidm4lx47sqkazxk6y3gfdfxa3msv4gjzlzs4rco

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

1000.00TiB

Signer Address

f1dnb3uz7sylxk6emti3ififcvu3nlufnnsjui6ea

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebml32phgsskdqidm4lx47sqkazxk6y3gfdfxa3msv4gjzlzs4rco

METAVERSEDATAMINING commented 1 year ago

Share a brief history of your organization and your relationship with the organization.

nicelove666 commented 1 year ago

hi, dear notary, glad to see your concern. I am a Filecoin investor and enthusiast. I have worked in Amazon Cloud, Alibaba Cloud, and Google Cloud. In 2018, I quit my high-paying job to all in the blockchain. In 2019, I paid attention to Filecoin and spent thousands of dollars. We purchased FIL for US$10,000. Today, we have FIL mining in Singapore, Thailand, Malaysia, and Hong Kong, China. We firmly believe that Filecoin is the future of distributed storage.

nicelove666 commented 1 year ago

We love Filecoin and plan to participate in this year's notary election, thank you for your attention and support

METAVERSEDATAMINING commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecqjyhooyouebhlxk4dd3h2bfmstla7l26x2wwlzx7t3omllc2ef4

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

1000.00TiB

Signer Address

f17idrnfnxl2mbgcgr57a6z2c6lj2qx56gvm3336i

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecqjyhooyouebhlxk4dd3h2bfmstla7l26x2wwlzx7t3omllc2ef4

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

DataCap allocation requested

1.95PiB

Id

718a3b4d-46a5-4894-8cb8-aa1a53f45532

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

1.95PiB

Total DataCap granted for client so far

909494701772928712704.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-1.09B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 1000.0TiB null 251.71TiB
sxxfuture-official commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 59.76% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NiwanDao commented 1 year ago

Will you balance the data replication in the next tranche? @nicelove666

nicelove666 commented 1 year ago

Hi, dear notary, thank you for your reply. In fact, our data backup is ok, but because only 3 sps use the 64G , and the other sps are all 32G , so it seems that the data backup is not very good. We are actively looking for SP for the 64G solution. Thank you for your attention. I believe that in the next round, the external backup problem will be improved.

NiwanDao commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedyokqb2utjkjwmnxzgpp4w3xissjswddrctuwtzkilfaarkkagng

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

1.95PiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedyokqb2utjkjwmnxzgpp4w3xissjswddrctuwtzkilfaarkkagng

sxxfuture-official commented 1 year ago

The only problem displayed by CID-checker is insufficient data replicas. But as explained above, when storing 64g and 32g sectors at the same time, there will indeed be cases where the backup data ID cannot be detected. I hope that the follow-up CID detection tools can be improved.

sxxfuture-official commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea5sdqlennigrkunqkafdki74sww3l32qgc6rowki5qu3pgtwczm2

Address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Datacap Allocated

1.95PiB

Signer Address

f1foiomqlmoshpuxm6aie4xysffqezkjnokgwcecq

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea5sdqlennigrkunqkafdki74sww3l32qgc6rowki5qu3pgtwczm2

sxxfuture-official commented 1 year ago

retrieve test OK. image deal_id from https://datacapstats.io/clients/f02123086

cryptowhizzard commented 1 year ago

It seems a lot of SP's are not available for retrieval.

This SP for example is not available : lotus client retrieve --provider f02078669 bafykbzacebk3dwbtc3gsyzastfpm3nqco5sbxmog3lal4apt6e6hil2nlqlky /root/downloadedcarfiles/f02123086-f02078669-34840110-baga6ea4seaqcro5am7m2gdkneexfzoarxfvzxpkdn3o4gpzw4osgwizwyjm7ini.car

Because i wondered why @sxxfuture-official could retrieve i tried retrieval multiple times here from every SP. I managed to get only one working with f02101475? @sxxfuture-official did you choose your retrieval randomly or did you get advised here?

Anyway @nicelove666 , please work with your SP's to improve retrieval.

Scherm­afbeelding 2023-05-05 om 20 47 38
large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

DataCap allocation requested

1.34PiB

Id

f48e34c3-ff4c-401a-b5de-a1475c7efccf

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1dvtqpza3rxtavcpw7muu62a62y3gn7zeam724hy

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

1.34PiB

Total DataCap granted for client so far

1.8160790205001833e+37YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-2.19B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
69800 21 1.95PiB 24.52 451.58TiB
nicelove666 commented 1 year ago

Dear community folks, sorry for the late reply.

I believe you have also seen that after the upgrade of the boost version, many SP retrieval problems have occurred, and boost needs to be restarted frequently. Currently, many SPs are updating the latest repair version. https://github.com/filecoin-project/boost/releases In fact, before this upgrade, it can be retrieved. Due to the upgrade, some SPs have changed the retrieval code, but the code of the new version has not been upgraded, so the situation you see appears.

WechatIMG6193

We attach great importance to your opinions. We have communicated with the SP you mentioned many times. At present, their search has been repaired. However, in order to avoid malicious attacks, some SPs have set up a paid search. You need to pay a small price to retrieve it. lassie fetch -o output1.car -p bafykbzacecmmbhohkhcpam6w4dyrh7z7zzvlzg6exkoikmdik6vjzkaqysmpq

WechatIMG6131

Finally, I would like to say, please treat SP and all enthusiastic people with kindness, thank you for your help, if possible, we also hope to get your help.

Last but not least, if there are SPs who continue to fail to solve the retrieval problem, we will no longer cooperate with them!

Thank you again for your attention, if you can, kind people, please help us!

nicelove666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 62.31% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.