filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Allocation] - Filecoin Plus Registry #1172

Closed lbj2004032 closed 1 year ago

lbj2004032 commented 1 year ago

name: Large Dataset Notary application about: Clients should use this application form to request a DataCap allocation via a LDN for a dataset title: "Filecoin Plus Registry" labels: 'application, Phase: Diligence' assignees: ''


Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Obtain equipment use rights from sponsoring companies and research institutes, and conduct secondary analysis and exploration of past accelerator data based on data mining and AI technology

What is the primary source of funding for this project?

Public welfare projects without funds

What other projects/ecosystem stakeholders is this project associated with?

no

Use-case details

Describe the data being stored onto Filecoin

Gene collision related data

Where was the data in this dataset sourced from?

Data generated by the Institute's research

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

http://124.219.161.88:22180/#s/8mvyG_uQ

http://124.219.161.88:22180/#s/8mvyG_uQ

http://124.219.161.88:22180/#s/8mvyG_uQ

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

yes

What is the expected retrieval frequency for this data?

100/d

For how long do you plan to keep this dataset stored on Filecoin?

3 years

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Japan

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Transmission through filecoin node, no offline transmission

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Find a storage provider on the filecoin website, or negotiate with a storage provider

How will you be distributing deals across storage providers?

Please answer here.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Have sufficient funds and do not need additional help for the time being
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

https://nhp-gp.com/ Is this website of your organization?

Describe the data being stored onto Filecoin

Gene collision related data Where was the data in this dataset sourced from?

Data generated by the Institute's research

Can you specify where are data from? What's the relationship between you and the organization? How much original data are there? How many copies will you store? Can you provide more detailed information about other storage providers participated in this program, such as you can list SPs you have contacted with at present?

lbj2004032 commented 1 year ago

https://nhp-gp.com/ Is this website of your organization?

Describe the data being stored onto Filecoin Gene collision related data Where was the data in this dataset sourced from? Data generated by the Institute's research

Can you specify where are data from? What's the relationship between you and the organization? How much original data are there? How many copies will you store? Can you provide more detailed information about other storage providers participated in this program, such as you can list SPs you have contacted with at present?

https://nhp-gp.com/ Is this website of your organization? Yes,this is our company website。 We are have a professional team working on HPC Solution. And we are also high level Channel Partner(VAP) of Hardware Vender Huawei and XFusion。 We developed Filecoin Hardware solution by useing software defined storage product(Oceanstor 9000 and Pacific 9540) from Huawei and Server from Huawei(Taishan200) and Xfusion(2288H V5/V6)since 2020. We are the first local team who provide Filecoin Hardware solution and operation and maintenance service in Japan。 And support our customer the first node(over 7PiB)in Japan at the head of 2021。 We have already support our customer buliding 8nodes ,total physical capacity over 62PB total effective capacity over 40PiB.

Describe the data being stored onto Filecoin Gene collision related data Where was the data in this dataset sourcedfrom? Data generated by the Institute's research Can you specify where are data from? What's the relationship between you and the organization? *How much original data are there? How many copies will you store?

I answer these question together. Because We have a professional team working on HPC solution, so We keep in touch with KEK , the High Energy Accelerator Research Organization(https://www.kek.jp/en/about-en/what-en/) for a long time. They have about 100PB storage(HDD+Tape) for storing and analysis data from high-energy particle beams and synchrotron light sources. This research help us human to advance our understanding of the universe that surrounds us, its mechanisms and their control.And Makoto Kobayashi Professor Emeritus of KEK was awarded a 2008 Nobel Prize for Physics for the theory to explain the origin of the broken symmetry which predicts the existence of at least three families of quarks in nature. Because KEK have too much data from super machine like SuperKEKB and ILC , and budget is limited,thats the reason they mix used HDD storage and Tape together,and dont backup data. We plan to actual cost reduction by using Filecoin+ to store some of data not only for backup but also useing some free sealing machine resource (P1nodes multi-core + P2/P3 nodes High performance GPU) to do some new research just like use AI to analysis the old data agian,try to get some new discovery. And there are a lot of earthquake around KEK location,earthquake in 2011 was a hard lesson,and nobody want experience again.So we also plan to store data to Filecoin nodes located at Tokyo,Osaka,Okinawa to protect data from earthquake,because

Because the data is so important,We plan to run it in 2 stages.And we need Filecoin+ support us to achieve stage1 asap.

Stage 1 KEK will provide about 4PiB data which could be published,these data from high-energy particle beams and synchrotron light sources of CERN,and KEK keep a copy offsite.
The same data can be download from http://opendata.cern.ch/ sample : http://opendata.cern.ch/record/363

We will store theres data to Filecoin nodes and provide to some researchers from KEK and some Phds postgraduates from universities data access and free sealing server multi-core and High performance GPU resource for research just like using AI to analysis old data again to get new discovers,training AI Algorithms for improving analysis efficiency about new data.

We plan 4-6 copys for the data, and owner of these node already agree to join this project. And We will use different nodes to support access from different area of researchers or students of university.

Kanto Tohoku Hokkaido Area f01623525 7.36PiB f01357002 7.27PiB f01126799 7.37PiB f0155983 7.36PiB sealing again 2023 1H

f01184717 1.24PiB f01662849 1.13PiB

f01715688 4.18PiB

Kyushu area f01848169 4.51PiB

Osaka area f01738789 709TiB

Stage 2 KEK will provide over 20PiB data ,these data from high-energy particle beams and synchrotron light sources(like SuperKEKB and ILC). First of all store data as backup,then analysis the data by using the free multi-core and High performance GPU resource from sealing servers.

Sunnyiscoming commented 1 year ago

They have about 100PB storage(HDD+Tape) for storing and analysis data from high-energy particle beams and synchrotron light sources.

Can you provide some documentation to prove that you are authorized by the organization to store this data as public data on the Filecoin network? Could you send an email with these doc. to filplus-app-review@fil.org with your official domain?

lbj2004032 commented 1 year ago

They have about 100PB storage(HDD+Tape) for storing and analysis data from high-energy particle beams and synchrotron light sources.

Can you provide some documentation to prove that you are authorized by the organization to store this data as public data on the Filecoin network? Could you send an email with these doc. to filplus-app-review@fil.org with your official domain?

The data is so important,and a lot of Research institutions pay for geting these data. Thats why We need to run it in 2 stages. Stage1 we plan to use data from Cern which is published(http://opendata.cern.ch/)and KEK keep a copy offsite. And reseachers who keep in touch with us will use this data and filecoin nodes and sealing machine do some research, make a report to KEK to prove Filecoin network cloud be used for research. Then professors of KEK will thinking about publish some old data and up for Filecoin+ , or up more data to Filecoin network but not publish the data.And KEK will sign to authorize us up data to Filecoin network before Stage 2 start.

First of all We need Filecoin+ support us to achieve stage1 asap. Because data is published by Cern,so I think authorized doc is not necessary at Stage1. I will send u email from address tajiam-y@nhp-gp.com.

Stage 1 KEK will provide about 4PiB data which could be published,these data from high-energy particle beams and synchrotron light sources of CERN,and KEK keep a copy offsite.
The same data can be download from http://opendata.cern.ch/ sample : http://opendata.cern.ch/record/363

We will store theres data to Filecoin nodes and provide to some researchers from KEK and some Phds postgraduates from universities data access and free sealing server multi-core and High performance GPU resource for research just like using AI to analysis old data again to get new discovers,training AI Algorithms for improving analysis efficiency about new data.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

4PiB

Expected weekly DataCap usage rate

50TiB

Client address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

DataCap allocation requested

25TiB

Id

77afd132-3526-44ea-9d39-53fc256e9b15

GaryGJG commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedsucap7kkc66t3ivnvb4pi6eea44dbcym5emns7dwcmgzmvascd6

Address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

Datacap Allocated

25.00TiB

Signer Address

f1zffqhxwq2rrg7rtot6lmkl6hb2xyrrseawprzsq

Id

77afd132-3526-44ea-9d39-53fc256e9b15

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedsucap7kkc66t3ivnvb4pi6eea44dbcym5emns7dwcmgzmvascd6

lbj2004032 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedsucap7kkc66t3ivnvb4pi6eea44dbcym5emns7dwcmgzmvascd6

Address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

Datacap Allocated

25.00TiB

Signer Address

f1zffqhxwq2rrg7rtot6lmkl6hb2xyrrseawprzsq

Id

77afd132-3526-44ea-9d39-53fc256e9b15

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedsucap7kkc66t3ivnvb4pi6eea44dbcym5emns7dwcmgzmvascd6

Is this a successful application? But when I transfer data, it prompts ".. not enough DataCap available for a verified deal" Can you help me to confirm

lbj2004032 commented 1 year ago
  • Obtain equipment use rights from sponsoring companies and research institutes, and conduct secondary analysis and exploration of past accelerator data based on data mining and AI technology

If there is a problem, can I reapply

cryptowhizzard commented 1 year ago

Dear applicant,

Thank you for applying for datacap. As Filecoin[ FIL+ notary](https://github.com/filecoin-project/notary-

Can you show us visible proof of the size of your data and the storage systems you have there?

As last question i would like you to fill out this form to provide us with the necessary information to make a educated decision on your LDN request if we would like to support it.

Thanks!

lbj2004032 commented 1 year ago

necessary

Question: Can you show us visible proof of the size of your data and the storage systems you have there?

Answer: KEK will provide about 4PiB data which could be published,these data from high-energy particle beams and synchrotron light sources of CERN,and KEK keep a copy offsite.
The same data can be download from http://opendata.cern.ch/ sample : http://opendata.cern.ch/record/363

and We have some local storage which will be worked as a temporary pool for caching data before datacap.

b6b9618fe4ce4b64c1d4643aeb066b6e_ d2d01574577847663316e2f1d4a27ff5_ dda5f369baa26124824817f3fd06aa0c_ 647a8241dbdfffe4a9c11c1ae895565d_

Question: As last question i would like you to fill out this form to provide us with the necessary information to make a educated decision on your LDN request if we would like to support it.

Answer: OK,I will file it later.

Sunnyiscoming commented 1 year ago

Hi @lbj2004032 You need one more notary approve this application. You can ask more notaries to do client due diligence and approve the application in slack channel. https://app.slack.com/client/TEHTVS1L6/C036JKD8NVA/thread/C03BG1MNQ4T-1673888660.823499

herrehesse commented 1 year ago

@lbj2004032 Not supportive of this LDN request due to the lack of distribution.

Can you tell me why you are bound to Japan only?

lbj2004032 commented 1 year ago

@lbj2004032 Not supportive of this LDN request due to the lack of distribution.

Can you tell me why you are bound to Japan only?

The reason we are bound to Japan only. Because We plan to provide data store and access service to
KEK , the High Energy Accelerator Research Organization ( Government research center ) and some Phds postgraduates from Japanese universities. If we use PSs oversea , data access must be cross Submarine cable,and it would be bottleneck. SPs which jion this project , get High-quality network communication service(over 1Gbps both uplink and downlink) for loca carrier。 So users can access SPs nearby them low latency and high bandwidth. If we want to make SPs oversea provide the same quality access service, We must use submarine cable dedicated line service,which will cost over 30,000 USD/month. This is too expensive and meaningless.

About the lack of distribution. We can easily find new some partners to provide some PS in other countries。 But we dont think it is necessary for this project,becuase all users of project locate in Japan. We used PSs in different regions of Japan is sufficien

cryptowhizzard commented 1 year ago

Hello @lbj2004032

Thank you for this explanation. It is clear to me.

The rules of Fil+ is that you can store 1 full replica yourself, another one in your region with another organisation, and 2 in a different regions with different organisations where one of them needs to be outside japan.

You can di retrievals from your main replica in your own miner. There won’t be extra bandwidth involved then.

The 4 different organisations are for redundancy and the network to evolve with real data. You get the 10x QAP in return for this so it sounds like a good deal to me.

if this is problematic then I propose to message kevzak en try to get this LDN in FIL-E

lbj2004032 commented 1 year ago

Hello @lbj2004032

Thank you for this explanation. It is clear to me.

The rules of Fil+ is that you can store 1 full replica yourself, another one in your region with another organisation, and 2 in a different regions with different organisations where one of them needs to be outside japan.

You can di retrievals from your main replica in your own miner. There won’t be extra bandwidth involved then.

The 4 different organisations are for redundancy and the network to evolve with real data. You get the 10x QAP in return for this so it sounds like a good deal to me.

if this is problematic then I propose to message kevzak en try to get this LDN in FIL-E

Thank you for your message.We follow the rule. And two oversea PSs join our project.

f01227975 Hongkong, China (provide retrieve and storage)

f0480313 Singapore still keep negotiating for providing retrieve service.

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebniqkinaiuvq6xcqdkrkl5x65lw2nxz6bmyizi3ranadijyfiyau

Address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

Datacap Allocated

25.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

77afd132-3526-44ea-9d39-53fc256e9b15

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebniqkinaiuvq6xcqdkrkl5x65lw2nxz6bmyizi3ranadijyfiyau

cryptowhizzard commented 1 year ago

Good morning

Looking forward to your next milestone.

Clarification for other notary's -> I received KYC documents a while ago. The first allocation is reasonable small ( 25 TB ) so i am willing to check now if the distribution is going alright.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

DataCap allocation requested

50TiB

Id

80a29581-3c00-452f-a28b-fd488d6340b9

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1djeaxezazhlg5334gla2j5jojkna6qnzxg7ovba

Last two approvers

cryptowhizzard & not found

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

50TiB

Total DataCap granted for client so far

96GiB

Datacap to be granted to reach the total amount requested by the client (4PiB)

3.99PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
34 2 25TiB 96.79 33.73GiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No active deals found for this client.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

MetaWaveInfo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f01848169: 89.49%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

aggregation-and-compliance-bot[bot] commented 9 months ago
Client f01505764 does not follow the datacap usage rules. More info here. This application has been failing the requirements for 7 days. Please take appropiate action to fix the following DataCap usage problems. Criteria Treshold Reason
Percent of used DataCap stored with top provider < 75 The percent of Data from the client that is stored with their top provider is 92.46%. This should be less than 75%