filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] <Organization> - <Project Name> #2032

Closed chenkun1223 closed 11 months ago

chenkun1223 commented 1 year ago

Data Owner Name

Protein Data Bank 3D Structural Biology Data

What is your role related to the dataset

Storage provider filling out application on behalf of the data owner

Data Owner Country/Region

United States

Data Owner Industry

Information, Media & Telecommunications

Website

https://www.rcsb.org/pages/snapshots

Social Media

none

Total amount of DataCap being requested

300TB

Expected size of single dataset (one copy)

30.75TB

Number of replicas to store

10

Weekly allocation of DataCap requested

300TiB

On-chain address for first allocation

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

Data Type of Application

Slingshot

Custom multisig

Identifier

No response

Share a brief history of your project and organization

none

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

The "Protein Data Bank (PDB) archive" was established in 1971 as the first open-access digital data archive in biology. It is a collection of three-dimensional (3D) atomic-level structures of biological macromolecules (i.e., proteins, DNA, and RNA) and their complexes with one another and various small-molecule ligands (e.g., US FDA approved drugs, enzyme co-factors). For each PDB entry (unique identifier: 1abc or PDB_0000001abc) multiple data files contain information about the 3D atomic coordinates, sequences of biological macromolecules, information about any small molecules/ligands present in the entry, details about the structure-determination experiment, authors and publication information, experimental data, and the wwPDB validation report. Additional content stored in the archive includes documentation, summary reports, and software (among others). The PDB is a jointly-managed core archive of the Worldwide Protein Data Bank partnership [RCSB Protein Data Bank (RCSB PDB, rcsb.org); Protein Data Bank in Europe (PDBe, pdbe.org); Protein Data Bank Japan (PDBj, pdbj.org); Electron Microscopy Data Bank (EMDB, emdb-empiar.org); and Biological Magnetic Resonance Bank (BMRB, bmrb.io)].

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

aws s3 ls --no-sign-request s3://pdbsnapshots/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

f01172521  (owner)
f01843178
f01877259
f02129771

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided.

chenkun1223 commented 1 year ago

感谢您的请求! ❗我们在所提供的信息中发现了一些问题。

  • 地址应以 f1、f2、f3 或 f4 开头 请查看请求并编辑问题的正文,提供所有必需的信息。

以修改 f1nn3reuxn3pbwjynoainndamja2o46nan5ch7hlq

Sunnyiscoming commented 1 year ago

Expected size of single dataset (one copy) 32GB

How many data in this dataset? Please modify this value.

f01172521 f01843178 f01877259 f02129771

Which are your nodes?

chenkun1223 commented 1 year ago

Expected size of single dataset (one copy) 32TB

How many data in this dataset? Please modify this value.

f01172521 (owner) f01843178 f01877259 f02129771

chenkun1223 commented 1 year ago

modified

Sunnyiscoming commented 1 year ago

Expected size of single dataset (one copy)Number of replicas to store≠Total amount of DataCap being requested 30.75TB10≠3PiB

Can you explain about that?

chenkun1223 commented 1 year ago

Expected size of single dataset (one copy)_Number of replicas to store≠Total amount of DataCap being requested 30.75TB_10≠3PiB

Can you explain about that?

Our calculation error has been corrected

Total amount of DataCap being requested 300TB

Sunnyiscoming commented 1 year ago

Can you introduce your organizaion? Could you send an email to filplus-app-review@fil.org with your official domain in order to confirm your identity? Email name should includes the issue id #2032.

chenkun1223 commented 1 year ago

filplus-app-review@fil.org

Send an email to filplus-app-review@fil.org

Sunnyiscoming commented 1 year ago

Please disclose the name of your organization and use your official domain send the message.

chenkun1223 commented 1 year ago

请公开您的组织名称并使用您的官方域发送消息。

Hello, we are a small team focused on Filecoin technology research and development and services. We have not established a company or organization. We have managed some nodes and also collaborated with individuals, teams, and small companies. We urgently need this batch of 30TB data for node encapsulation. At the same time, we hope to have more professional data of this kind in the future. We can provide external services such as data download and preview, I look forward to Filecoin's data ecosystem becoming better and better.

Sunnyiscoming commented 1 year ago

What percentage of datacap will your nodes store?

chenkun1223 commented 1 year ago

您的节点将存储多少百分比的数据上限?

My node will store 10% of the upper limit data, and then exchange and cooperate with other FIL storage providers to store all the data

large-datacap-requests[bot] commented 1 year ago

Deleting comment

@Sunnyiscoming hasn't the permissions to post this comment.

Please, contact the assignee of this issue.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

300TiB

Expected weekly DataCap usage rate

300TiB

Client address

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

DataCap allocation requested

13.64TiB

Id

3844cc95-d826-44eb-bc19-41df43ab97e0

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

chenkun1223 commented 1 year ago

We are still preparing

AlanGreaterheat commented 1 year ago

Would love to see small to medium sized SPs storing public datasets as the first round of willingness to support, I will continue to focus on data dispersion and fast retrieval support

sxxfuture-official commented 1 year ago

@chenkun1223 Please tell me the name of the organization applying for the current LDN, and the corresponding website Also, what is your relationship with the agency?

AlanGreaterheat commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceceqmqgsrpr6othsjhioszp7krmvymsevpzwifkrxxlrtauiuv57s

Address

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

Datacap Allocated

13.64TiB

Signer Address

f1pnmzlxj7cfeo2v6oj5nco46hkg2l46wj7o4xxui

Id

3844cc95-d826-44eb-bc19-41df43ab97e0

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceceqmqgsrpr6othsjhioszp7krmvymsevpzwifkrxxlrtauiuv57s

chenkun1223 commented 1 year ago

希望看到中小型SP存储公共数据集作为第一轮支持意愿,我将继续专注于数据分散和快速检索支持

Thank you very much. We will continue to work hard and hope that the filecoin ecosystem will become better and better

chenkun1223 commented 1 year ago

请告诉我申请当前LDN的组织的名称以及相应的网站 另外,您与该机构的关系是什么?

Hello, our organization is Handian Supercomputing Data, and this is our official website http://www.inscdt.com/ I am the operations engineer of the company

sxxfuture-official commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecdjvszgho42bwvwmlcv744djaaxychs5hxkar72yeu5afaio7kc2

Address

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

Datacap Allocated

13.64TiB

Signer Address

f1foiomqlmoshpuxm6aie4xysffqezkjnokgwcecq

Id

3844cc95-d826-44eb-bc19-41df43ab97e0

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecdjvszgho42bwvwmlcv744djaaxychs5hxkar72yeu5afaio7kc2

cryptowhizzard commented 1 year ago

checker:manualTrigger

chenkun1223 commented 11 months ago

checker:manualTrigger

large-datacap-requests[bot] commented 11 months ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

DataCap allocation requested

259.20TiB

Id

75eb706c-3ded-4f11-ae69-e1844ce3fe07

large-datacap-requests[bot] commented 11 months ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1lmra3skm3wtbtkw7hbk2tvvkjbeza5tj35rwopa

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

259.20TiB

Total DataCap granted for client so far

13.64TiB

Datacap to be granted to reach the total amount requested by the client (300TB)

259.20TiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
0 0 13.64TiB NaN 13.64TiB
w1259980480 commented 11 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 11 months ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f02129771: 100.00%

⚠️ 1 storage providers have unknown IP location - f02129771

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

joshua-ne commented 11 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 11 months ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f02129771: 100.00%

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

zcfil commented 11 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 11 months ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f02129771: 100.00%

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

aggregation-and-compliance-bot[bot] commented 7 months ago
Client f02208521 does not follow the datacap usage rules. More info here. This application has been failing the requirements for 7 days. Please take appropiate action to fix the following DataCap usage problems. Criteria Treshold Reason
Percent of used DataCap stored with top provider < 75 The percent of Data from the client that is stored with their top provider is 100%. This should be less than 75%