filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] FileDrive Labs - Datasets Landing Plan V2 - [4/5] #1626

Closed laurarenpanda closed 1 year ago

laurarenpanda commented 1 year ago

Data Owner Name

FileDrive Labs

Data Owner Country/Region

China

Data Owner Industry

Life Science / Healthcare

Website

https://filedrive.io/

Social Media

Twitter: https://twitter.com/FileDrive1
Medium: https://medium.com/@FileDrive1
WeChat Offical Account: FileDrive

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Custom multisig

Identifier

No response

Share a brief history of your project and organization

FileDrive Datasets Landing Plan is a project for onboarding more valuable public datasets onto the Filecoin network. Through several phases, we plan to bring 10 PiB data and promote 100 PiB storage power growth to Filecoin. 

About FileDrive Datasets

FileDrive Datasets is a platform to effectively connect the huge storage market that Filecoin has built with publishers of public datasets.
The Filecoin network provides reliable, secure, and affordable decentralized storage services, and FileDrive Labs wants to deliver these benefits to end-users by building a public dataset platform.
It is challenging to attract traditional Cloud Storage and Object-base Storage users to the Filecoin network and benefit from it. Developers in the Felicoin ecosystem, such as FileDrive Labs, need to face this challenge together.
As a member of the Filecoin ecosystem, FileDrive Labs has been insisting on developing useful tools to make it easier for users to store their data onto the Filecoin network. 

FileDrive Datasets has integrated a group of tools to provide storage service with the compatibility of both Cloud Storage and Object-base Storage and better user experience to attract more users.
Projects(ongoing) behind:
- Go-Graphsplit: https://github.com/filedrive-team/go-graphsplit
- DS-Cluster: https://github.com/filedrive-team/go-ds-cluster
- Filejoy: https://github.com/filedrive-team/filejoy

Article about FileDrive Datasets on Filecoin Blog:
- Large Datasets: FileDrive: https://filecoin.io/blog/posts/large-datasets-filedrive/

About FileDrive Labs

FileDrive Labs has always defined ourselves as tool developers and infrastructure builders in the Filecoin ecosystem. From 2019, we continuously focus on technical solutions and development based on IPFS protocol and the Filecoin network and do our best to contribute to the community.
Over 80% of our team are qualified engineers, and half of them have more than 10-year development experience in multiple industries, including Communication, the Internet, and blockchain.
Since 2020, we have participated in Slingshot Competition, become one of the top teams, and stored over 5 PiB useful data from public datasets to the Filecoin network.
To contribute to the Filecoin Community, we developed an open-source data prep tool Graphsplit, FIL+ project dashboard filplus.info and storage provider discovery platform filfind,info.
Besides, we have also hold weekly online virtual events named FileDrive Meetup from March 2022, which aims to provide a platform for community members to grasp the latest trends of the Filecoin network and our work and research.

Please check the following links for more details.
- GitHub: https://github.com/filedrive-team
- Twitter: https://twitter.com/FileDrive1
- Eventbrite: https://www.eventbrite.hk/o/filedrive-labs-42456337463
- YouTube Channel: https://www.youtube.com/channel/UCxcZC1dtBUlQvZY7DX13W1w
- Medium: https://medium.com/@FileDrive1

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

FileDrive Datasets Landing Plan #2
- Datasets: 10

List of Datasets in #2:

1. Transiting Exoplanet Survey Satellite (TESS)
- The Transiting Exoplanet Survey Satellite (TESS) is a multi-year survey that will discover exoplanets in orbit around bright stars across the entire sky using high-precision photometry. The survey will also enable a wide variety of stellar astrophysics, solar system science, and extragalactic variability studies. More information about TESS is available at MAST and the TESS Science Support Center.
- https://registry.opendata.aws/tess/
- License: STScI herby grants the non-exclusive, royalty free, non-transferable, worldwide right and license to use, reproduce and publicly display in all media public data from the TESS mission.
- Size: 285.6 TiB

2. Oxford Nanopore Technologies Benchmark Datasets
- The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.
- https://registry.opendata.aws/ont-open-data/
- License: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
- Size: 60.3 TiB

3. Community Earth System Model v2 ARISE (CESM2 ARISE)
- Data from ARISE-SAI Experiments with CESM2
- https://registry.opendata.aws/ncar-cesm2-arise/
- License: Creative Commons Attribution 4.0 International (CC BY 4.0)
- Size: 263.5 TiB

4. NOAA Wave Ensemble Reforecast
- This is a 20-year global wave reforecast generated by WAVEWATCH III model (https://github.com/NOAA-EMC/WW3) forced by GEFSv12 winds (https://noaa-gefs-retrospective.s3.amazonaws.com/index.html). The wave ensemble was run with one cycle per day (at 03Z), spatial resolution of 0.25°X0.25° and temporal resolution of 3 hours. There are five ensemble members (control plus four perturbed members) and, once a week (Wednesdays), the ensemble is expanded to eleven members. The forecast range is 16 days and, once a week (Wednesdays), it extends to 35 days. More information about the wave modeling, wave grids and calibration can be found in the WAVEWATCH III regtest ww3_ufs1.3 (https://github.com/NOAA-EMC/WW3/tree/develop/regtests/ww3_ufs1.3).
- https://registry.opendata.aws/noaa-wave-ensemble-reforecast/
- License: Open Data. There are no restrictions on the use of this data.
- Size: 114.3TiB

5.UCSC Genome Browser Sequence and Annotations
- The UCSC Genome Browser is an online graphical viewer for genomes, a genome browser, hosted by the University of California, Santa Cruz (UCSC). The interactive website offers access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. This dataset is a copy of the MySQL tables in MyISAM binary and tab-sep format and all binary files in custom formats, sometimes referred as 'gbdb'-files. Data from the UCSC Genome Browser is free and open for use by anyone. However, every genome annotation track has been created by an academic research group, or, in a few cases, by commercial companies. Please acknowledge them by citing them. The information can be found by going to https://genome.ucsc.edu, selecting the respective genome assembly and clicking on the data track. At the end of the documentation, we provide a list of references and acknowledgements.
- https://registry.opendata.aws/ucsc-genome-browser/
- License: https://genome.ucsc.edu/license/
- Size:  81.7 TiB

6.Open Observatory of Network Interference (OONI)
- A free software, global observation network for detecting censorship, surveillance and traffic manipulation on the internet.
- https://registry.opendata.aws/ooni/
- License: Creative Commons Attribution 4.0 International (CC BY 4.0)
- Size: 135 TiB

7.OpenProteinSet
- Multiple sequence alignments (MSAs) for 132,000 unique Protein Data Bank (PDB) chains, covering 640,000 PDB chains in total, and 4,850,000 UniClust30 clusters. Template hits are also provided for the PDB chains and 270,000 UniClust30 clusters chosen for maximal diversity and MSA depth. MSAs were generated with HHBlits (-n3) and JackHMMER against MGnify, BFD, UniRef90, and UniClust30 while templates were identified from PDB70 with HHSearch, all according to procedures outlined in the supplement to the AlphaFold 2 Nature paper, Jumper et al. 2021. We expect the database to be broadly useful to structural biologists training or validating deep learning models for protein structure prediction and related tasks.
- https://registry.opendata.aws/openfold/
- License: Creative Commons Attribution 4.0 International (CC BY 4.0)
- Size: 4.9 TiB

8.AI2 Diagram Dataset (AI2D)
- 4,817 illustrative diagrams for research on diagram understanding and associated question answering.
- https://registry.opendata.aws/allenai-diagrams/
- License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Size: 6.4 TiB

9. Legal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)
- The Legal Entity Identifier (LEI) is a 20-character, alpha-numeric code based on the ISO 17442 standard developed by the International Organization for Standardization (ISO). It connects to key reference information that enables clear and unique identification of legal entities participating in financial transactions. Each LEI contains information about an entity’s ownership structure and thus answers the questions of 'who is who’ and ‘who owns whom’. Simply put, the publicly available LEI data pool can be regarded as a global directory, which greatly enhances transparency in the global marketplace. The Financial Stability Board (FSB) has reiterated that global LEI adoption underpins “multiple financial stability objectives” such as improved risk management in firms as well as better assessment of micro and macro prudential risks. As a result, it promotes market integrity while containing market abuse and financial fraud. Last but not least, LEI rollout “supports higher quality and accuracy of financial data overall”. The publicly available LEI data pool is a unique key to standardized information on legal entities globally. The data is registered and regularly verified according to protocols and procedures established by the Regulatory Oversight Committee. In cooperation with its partners in the Global LEI System, the Global Legal Entity Identifier Foundation (GLEIF) continues to focus on further optimizing the quality, reliability and usability of LEI data, empowering market participants to benefit from the wealth of information available with the LEI population. The drivers of the LEI initiative, i.e. the Group of 20, the FSB and many regulators around the world, have emphasized the need to make the LEI a broad public good. The Global LEI Index, made available by GLEIF, greatly contributes to meeting this objective. It puts the complete LEI data at the disposal of any interested party, conveniently and free of charge. The benefits for the wider business community to be generated with the Global LEI Index grow in line with the rate of LEI adoption. To maximize the benefits of entity identification across financial markets and beyond, firms are therefore encouraged to engage in the process and get their own LEI. Obtaining an LEI is easy. Registrants simply contact their preferred business partner from the list of LEI issuing organizations available on the GLEIF website.
- https://registry.opendata.aws/lei/
- License: Creative Commons (CC0) license
- Size: 6.0 TiB

10. COVID-19 Genome Sequence Dataset
- A centralized sequence repository for all records containing sequence associated with the novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). Included are both the original sequences submitted by the principal investigator as well as SRA-processed sequences that require the SRA Toolkit for analysis. Additionally, submitter provided metadata included in associated BioSample and BioProject records is available alongside NCBI calculated data, such k-mer based taxonomy analysis results, contiguous assemblies (contigs) and associated statistics such as contig length, blast results for the assembled contigs, contig annotation, blast databases of contigs and their annotated peptides, and VCF files generated for each record relative to the SARS-CoV-2 RefSeq record. Finally, metadata is additionally made available in parquet format to facilitate search and filtering using the AWS Athena Service.
- https://registry.opendata.aws/ncbi-covid-19/
- License: NIH Genomic Data Sharing Policy
- Size: 1.2 PiB

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

FileDrive Datasets: 
https://datasets.filedrive.io/

Original Source:
1. Transiting Exoplanet Survey Satellite (TESS)
- https://registry.opendata.aws/tess/

2. Oxford Nanopore Technologies Benchmark Datasets
- https://registry.opendata.aws/ont-open-data/

3. Community Earth System Model v2 ARISE (CESM2 ARISE)
- https://registry.opendata.aws/ncar-cesm2-arise/

4. NOAA Wave Ensemble Reforecast
- https://registry.opendata.aws/noaa-wave-ensemble-reforecast/

5.UCSC Genome Browser Sequence and Annotations
- https://registry.opendata.aws/ucsc-genome-browser/

6.Open Observatory of Network Interference (OONI)
- https://registry.opendata.aws/ooni/

7.OpenProteinSet
- https://registry.opendata.aws/openfold/

8.AI2 Diagram Dataset (AI2D)
- https://registry.opendata.aws/allenai-diagrams/

9. Legal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)
- https://registry.opendata.aws/lei/

10. COVID-19 Genome Sequence Dataset
- https://registry.opendata.aws/ncbi-covid-19/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Weekly

For how long do you plan to keep this dataset stored on Filecoin

More than 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Shipping hard drives

How do you plan to choose storage providers

Slack, Filmine

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

Please check the Checker Reports of our previous LDN applications:
- https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1266
- https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1267
- https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1268

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

See questions in https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1623.

herrehesse commented 1 year ago

Dear Filecoin+ Github applicant,

We have noticed that some of you are submitting merged datacap requests for datasets that are already (partly) on the chain. While we appreciate your enthusiasm to contribute to the Filecoin network, we want to remind you that this behaviour may not be beneficial to the network in the long run. In fact, this behaviour has been questioned and discussed in issue #832 on the Filecoin notary-governance Github repository.

We encourage you to review the discussions in issue #832. It's important to ensure that your datacap requests are valid, necessary, and add value to the network. By doing so, you can help to maintain the integrity and sustainability of the Filecoin network.

You can find the link to issue #832 here: filecoin-project/notary-governance#832

Thank you for your understanding and cooperation.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

DataCap allocation requested

250TiB

Id

0f0e317e-b29b-48a7-ab90-1d8bb5b63415

Sunnyiscoming commented 1 year ago

Related proposal https://github.com/filecoin-project/notary-governance/issues/832 Hope more notaries review this application and comment on this proposal.

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaced573o22b5fv3j3haflcrdugnt7gnptdqy67nqfinszrf45yybfoe

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

250.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

0f0e317e-b29b-48a7-ab90-1d8bb5b63415

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced573o22b5fv3j3haflcrdugnt7gnptdqy67nqfinszrf45yybfoe

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecn44fvnnh2d4docv57njjfkj5ef5nyivsjb3iw2fhjabxlakdszq

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

250.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

0f0e317e-b29b-48a7-ab90-1d8bb5b63415

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecn44fvnnh2d4docv57njjfkj5ef5nyivsjb3iw2fhjabxlakdszq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

DataCap allocation requested

500TiB

Id

568d51ee-184a-4649-b93a-53ccd7cb9caf

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Joss-Hua commented 1 year ago

The validity of the retrieval was confirmed on #1623

Joss-Hua commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebzes5r3eyziaee2mtp7fp65axzcwmvoifhnzgaam5yomstjlm4wg

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

500.00TiB

Signer Address

f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebzes5r3eyziaee2mtp7fp65axzcwmvoifhnzgaam5yomstjlm4wg

mikezli commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedu7c2s25sxzqzzcr2w4e26admq3wcmqrldrlhz53sj3iceeg3lue

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

500.00TiB

Signer Address

f1dnb3uz7sylxk6emti3ififcvu3nlufnnsjui6ea

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedu7c2s25sxzqzzcr2w4e26admq3wcmqrldrlhz53sj3iceeg3lue

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

DataCap allocation requested

1000.0TiB

Id

2521089d-2b70-41d9-852e-cc487dadd031

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

laurarenpanda commented 1 year ago

checker:manualTrigger f1bycr5r3ymkgqvkuxoemgsmnuawyawptwj44mqdi f14uhjnqrocqcenbjfaergw2uvaimysi4snv2oepy f1sejgqbuwsf74qifuxqykwotyu5aswuwhubxghqa f146dbcnpkwoabe2zu5z67ti3cfltuxxxttvgoxwa f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Storage Provider Distribution

⚠️ 1 storage providers have unknown IP location - f02104858

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

laurarenpanda commented 1 year ago

Since FileDrvie Datasets Landing Plan is a continuing project with a group of public datasets started in 2022, the Checker Report shows CID sharing mainly because of the following reasons:

luobin544 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 4 storage providers sealed too much duplicate data - f01228100: 27.42%, f01228089: 28.11%, f01228105: 28.11%, f01984580: 33.33%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

luobin544 commented 1 year ago

10101681896473_ pic

luobin544 commented 1 year ago

There is cid sharing in the robot inspection, and the inspection of duplicate items is consistent with the applicant's description, and the retrieval is normal. Willing to support.

luobin544 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceded3tvyoxr7xtutwt4cuycvy5avy57bxaxtjpkg3ubonlnjgf4ug

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

1000.00TiB

Signer Address

f1tbd632f6w62glfaf7wjpimacbnjiz26poyoes2q

Id

2521089d-2b70-41d9-852e-cc487dadd031

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceded3tvyoxr7xtutwt4cuycvy5avy57bxaxtjpkg3ubonlnjgf4ug

TimGuo7 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecc2ux6235f3lfwldlttnqkwbcaw6qz3ks52pxr3esiycwg2qgwd4

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

1000.00TiB

Signer Address

f1yslbnnqzrjlyuxsmyxfbqcc7xthcavgpripjevi

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecc2ux6235f3lfwldlttnqkwbcaw6qz3ks52pxr3esiycwg2qgwd4

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

DataCap allocation requested

1.95PiB

Id

b7b3bee4-74f7-4ef5-a138-5b211fddd020

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

1.95PiB

Total DataCap granted for client so far

909494701772928712704.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-1.09B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
34223 14 1000.0TiB 18.72 325.71TiB
ipollo00 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 4 storage providers sealed too much duplicate data - f01228100: 27.42%, f01228089: 28.11%, f01228105: 28.11%, f01984580: 28.15%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

ipollo00 commented 1 year ago

Since FileDrvie Datasets Landing Plan is a continuing project with a group of public datasets started in 2022, the Checker Report shows CID sharing mainly because of the following reasons:

  • Share CIDs with previous FileDrive Landing Plan: some data stored less than 5 copies, so we distributed more copies with DC of this V2 LDN
  • Share CIDs with other LDNs onboarding public datasets: using the same data processing tools could cause this problem with the same configuration file

No questions from my side.

ipollo00 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceddprdn3yhf5yv25c7k356cqbt4n6kk4fqspnzprcjjgxlr63bor4

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

1.95PiB

Signer Address

f1n5wlrrhoxpkgwij25xrtt7w7g2k3fhbthmdn6ri

Id

b7b3bee4-74f7-4ef5-a138-5b211fddd020

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceddprdn3yhf5yv25c7k356cqbt4n6kk4fqspnzprcjjgxlr63bor4

newwebgroup commented 1 year ago
image
newwebgroup commented 1 year ago

Checked the Github history, the part about Sharing was explained. And got verification and support from multiple notaries.

Also retrieved whether SP supports fetching

Willing to make support in this round

newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedhei6jt3mmap5oeb2tj7l5vx6v5b5ya6fqjmbyn3uyrpe7nljz7k

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

1.95PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedhei6jt3mmap5oeb2tj7l5vx6v5b5ya6fqjmbyn3uyrpe7nljz7k

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

DataCap allocation requested

1.34PiB

Id

c9282f1c-ad4f-4420-9b76-c8509fcbefe4

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

1.34PiB

Total DataCap granted for client so far

1.8160790205001833e+37YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-2.19B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
103740 15 1.95PiB 13.92 507.18TiB
Joss-Hua commented 1 year ago
image
Joss-Hua commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedf67pvzfmbql23bz5oxjs2iscmaiymqz2a762567gjz4qgit4egy

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

1.34PiB

Signer Address

f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi

Id

c9282f1c-ad4f-4420-9b76-c8509fcbefe4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedf67pvzfmbql23bz5oxjs2iscmaiymqz2a762567gjz4qgit4egy

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecprmbholbfwueptuz57hjjn3pmbpviq6hwfi3aqqji2wrdlc3fiq

Address

f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

Datacap Allocated

1.34PiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

not found

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecprmbholbfwueptuz57hjjn3pmbpviq6hwfi3aqqji2wrdlc3fiq

kevzak commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01984580: 20.01%

Deal Data Replication

⚠️ 81.37% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

raghavrmadya commented 1 year ago

This report has raised T&T flags. Client is requested to respond to the report before getting further allocations of DC

laurarenpanda commented 1 year ago

checker:manualTrigger f1bycr5r3ymkgqvkuxoemgsmnuawyawptwj44mqdi f14uhjnqrocqcenbjfaergw2uvaimysi4snv2oepy f1sejgqbuwsf74qifuxqykwotyu5aswuwhubxghqa f146dbcnpkwoabe2zu5z67ti3cfltuxxxttvgoxwa f1w7oommwezzhsyfh4ax7tbtd7zgl6i4m6hvnvd4i

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

laurarenpanda commented 1 year ago

GM, @raghavrmadya & @kevzak.

Here are my responses to the Checker Report of this LDN.

  1. Since FileDrive Landing Plan V2 includes 5 LDNs, please review the above Checker Report with all 5 addresses: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1626#issuecomment-1562125753

  2. As FileDrvie Datasets Landing Plan is a continuing project with a group of public datasets started in 2022, the Checker Report shows CID sharing mainly because of the following reasons:

    • Share CIDs with previous FileDrive Landing Plan: some data stored less than 5 copies, so we distributed more copies with DC of this V2 LDN
    • Share CIDs with other LDNs onboarding public datasets: using the same data processing tools could cause this problem with the same configuration file.
kernelogic commented 1 year ago

I support @laurarenpanda 's explanation. It is common to have less than ideal CID report for a single LDN in a series of LDNs of same purpose because of the 5PB cap.

I think in the future applicants can utilize the new 15PiB cap and even more with E-FIL+ route. But this application is a legacy limitation and is not a T&T issue.

cryptowhizzard commented 1 year ago

I support @laurarenpanda 's explanation. It is common to have less than ideal CID report for a single LDN in a series of LDNs of same purpose because of the 5PB cap.

I think in the future applicants can utilize the new 15PiB cap and even more with E-FIL+ route. But this application is a legacy limitation and is not a T&T issue.

Same here. Agree with @kernelogic

Chris00618 commented 1 year ago

This CID sharing behavior looks very similar to @kernelogic 's previous operation. I don't think it's something that would be acceptable and emulated by the community. I tend to think of it as a way for DC to be abused.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/457

Joss-Hua commented 1 year ago

Agree with @kernelogic @cryptowhizzard in this