filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Ghost Byte Inc - Encyclopedia of DNA Elements (ENCODE) - [2/2] #1224

Closed Trevor-K-Smith closed 1 year ago

Trevor-K-Smith commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

image

DP Info

Client Info

Data Info

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Ghost Byte Inc is a storage provider seeking to onboard data to meet the high demand of FIL+ for itself and its partners. Ghost Byte has a history of actively participating in NA weekly calls, helping community members on the slack channel, testing beta software with feedback, and overall ongoing support in the community of filecoin. Ghost Byte works with industry partners to assist the growth in web3 adoption.
Ref: https://www.youtube.com/watch?v=6PejYUlN0AM

What is the primary source of funding for this project?

Ghost Byte Inc

What other projects/ecosystem stakeholders is this project associated with?

Ghost Byte Inc

Use-case details

Describe the data being stored onto Filecoin

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing a diverse range of RNA sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and immunoprecipitation (IP) of proteins that interact with DNA and RNA, i.e., modified histones, transcription factors, chromatin regulators, and RNA-binding proteins, followed by sequencing.

Below is short blub on details of the data.

Where was the data in this dataset sourced from?

This data is being replicated from AWS Opendata to Filecoin. The dataset being replicated is **Encyclopedia of DNA Elements (ENCODE)**. This dataset contains **1,247,873 Total Objects** and is **1.0 PiB Total Size**. The data will be replicated a total of 10 times for a total datacap request of 10 PiB.
Ref: https://registry.opendata.aws/encode-project/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://www.dropbox.com/sh/45dbnb1vedx3uij/AAAvEt9bF959-kor2sH7nM5ha?dl=0

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Public - https://registry.opendata.aws/encode-project/

What is the expected retrieval frequency for this data?

1-3 Year

For how long do you plan to keep this dataset stored on Filecoin?

540 Days, subject to renewal when the time comes.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Global partners.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Data will be send over boostd to participating storage providers. Otherwise, offline deals can be done for those with special requirements.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Storage Providers will be found in the active community slack, partners met at events, and online data sources. 1 per actor, 2 per organization, spread as evenly across the globe a possible. Total 10 replications. SP's will be confirmed ahead of replication of cars that they intend to allow the cars to be accessible and retrievable. Car files for this replication will not be enforced to keep unsealed sectors as this is disaster recovery data, and retrievals will be low. 

How will you be distributing deals across storage providers?

Singularity will be used to serve up the deals and track the progress of each CAR file being replicated. 1 per actor, 2 per organization, spread as evenly across the globe a possible. Total 10 replications.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have the resources to get started right away. We do not need help at this time. Thank you!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

125TiB

Client address

f1m54rlpqha44mgfm3oa4nxc3exmq3k5azn7cv7fi

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1usa7klkhkxwswmvvw2eqnser26fvp66s46umeoi

DataCap allocation requested

62.5TiB

Id

280caace-4323-4851-b1a0-e84e523cb2fb

psh0691 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedocuqyxy2ctxp764ea7vwx36jimn46zu5nnvd4k67buft7mgkuxw

Address

f1m54rlpqha44mgfm3oa4nxc3exmq3k5azn7cv7fi

Datacap Allocated

62.50TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

280caace-4323-4851-b1a0-e84e523cb2fb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedocuqyxy2ctxp764ea7vwx36jimn46zu5nnvd4k67buft7mgkuxw

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceblpkgkkebuupl5b4x2tsfhsnlj5w6xt7ogtfxeaadjjybzencd72

Address

f1m54rlpqha44mgfm3oa4nxc3exmq3k5azn7cv7fi

Datacap Allocated

62.50TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

280caace-4323-4851-b1a0-e84e523cb2fb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceblpkgkkebuupl5b4x2tsfhsnlj5w6xt7ogtfxeaadjjybzencd72

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
Trevor-K-Smith commented 1 year ago

@simonkim0515 @raghavrmadya

Can i kindly get this application reviewed? Thanks

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
data-programs commented 5 months ago
KYC

This user’s identity has been verified through filplus.storage