Closed herrehesse closed 1 year ago
Thanks for your request! Everything looks good. :ok_hand:
A Governance Team member will review the information provided and contact you back pretty soon.
Hi,
Hidde is taking over part of my work here. This one is valid for Speedium / Dcent as followup for my previous requests.
Total DataCap requested
5PiB
Expected weekly DataCap usage rate
1PiB
Client address
f1xlnnzzl2d73amahvsu5uyfwdascodwtw6lopw5y
f01858410
f1xlnnzzl2d73amahvsu5uyfwdascodwtw6lopw5y
256TiB
7fe8a054-104f-464b-b6c6-4d0752bb7fd9
There is no previous allocation for this issue.
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
Your Datacap Allocation Request has been approved by the Notary
bafy2bzacebscsetrgbdiptw3rhwurty7ftncnwhwmrnjsemxaku75nun6jc7u
Address
f1xlnnzzl2d73amahvsu5uyfwdascodwtw6lopw5y
Datacap Allocated
475.76TiB
Signer Address
f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q
Id
7fe8a054-104f-464b-b6c6-4d0752bb7fd9
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebscsetrgbdiptw3rhwurty7ftncnwhwmrnjsemxaku75nun6jc7u
Public data. Willing to support.
Why can you apply for so much data (30P) on your own, and on behalf of different organizations Are you an administrator of FIL+?
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1516 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1514 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1513 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1512 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1511
Dear Applicant,
Due to the increased amount of erroneous/wrong Filecoin+ data recently, on behalf of the entire community, we feel compelled to go deeper into datacap requests. Hereby to ensure that the overall value of the Filecoin network and Filecoin+ program increases and is not abused.
Please answer the questions below as comprehensively as possible.
Customer data
Could you demonstrate exactly how and to what extent customer contact occurred? We expect that for the onboarding of customers with the scale of an LDN there would have been at least multiple email and perhaps several chat conversations preceding it. A single email with an agreement does not qualify here.
Did the customer specify the amount of data involved in this relevant correspondence?
Why does the customer in question want to use the Filecoin+ program?
Should this only be soley for acquiring datacap this is of course out of the question. The customer must have a legitimate reason for wanting to use the Filecoin+ program which is intended as a program to store useful and public datasets on the network.
Why is the customer data considered Filecoin+ eligible? (As an intermediate solution Filecoin offers the FIL-E program or the glif.io website for business datasets that do not meet the requirements for a Filecoin+ dataset)
Files and Processing
Could you please demonstrate to us how you envision processing and transporting the customer data in question to any location for preparation? Would you demonstrate to us that the customer, the preparer and the intended storage providers all have adequate bandwidth to process the set with its corresponding size? Would you tell us how the data set preparer takes into account the prevention of duplicates in order to prevent data cap abuse? Hopefully you understand the caution the overall community has for onboarding the wrong data. We understand the increased need for Filecoin+, however, we must not allow the program to be misused. Everything depends on a valuable and useful network, let's do our best to make this happen. Together.
We fully understand that extra due diligence is being done on large datacap requests given the large amounts of erroneous data lately. The focus should remain on keeping fraudulent datacap requests out.
Customer data
We have participated on behalf of Speedium Networks in saving useful data since the beginning of the Slingshot program. We have stored multiple datasets, explicitly selected on behalf of the Protocol Labs team to further the value of the network. The datasets in question consist of several important scientific studies such as cancer research, gnome analyses, observations of the galaxy, weather reports, DNA databases and much more.
The corresponding dataset (Distributed Archives Neurophysiology Data Integration (DANDI)) is publicly available and stored through us on the Filecoin network for eternity and for the benefit of mankind. As a result, no customer contact is available.
We make a datacap request in connection with being able to store the useful, public and open datasets. Accessible for everyone. The current market for paying clients and corporate data (FIL-E) is still developing. By storing useful public data on the network, we can currently show opportunities and usability to potential paying customers. (FIL-E)
The reason for using the Filecoin+ network is that these datasets lend themselves perfectly to Filecoin's mission. Namely, to "store humanity's most important information." We strongly believe that useful information advances the entire Filecoin ecosystem, so we encourage everyone to ensure that the data for which a request is made is actually useful to everyone.
If it is purely a business request for encrypted data this should be FIL-e. Scraped data from websites without consent, CAR files created by other data preparers without their consent, unimportant data such as cooking courses, security footage or completely random images do not fulfill the mission of the Filecoin+ program and should not be accepted.
The exact size of each dataset is clear in advance and based on this specific size we can determine how much datacap is needed for the numbers of duplications required. The exact volumes are visible on the AWS buckets, Azure or Google cloud database where they are currently stored.
Files and Processing
We have realised a BGP endpoint in Amsterdam in the Netherlands, from here we download, pack and send the above mentioned datasets. Our bandwidth at this location is 80Gbps, sufficient to process 0.5PiB of raw data daily and send it to selected storage providers. We have provided 30 PiB of storage capacity and several machines for building CAR files and web servers to serve downloads.
We take our work as a packing provider for the Filecoin network extremely seriously and do our best to run demos with paying customers. We believe in this is for the future benefit of the whole network.
Our download locations such as AWS buckets, Azure or Google cloud tend to have bandwidth between 10 and 40 Gbps. The storage providers we select for data storage vary between 1 and 30 Gbps. We ensure that the larger data sets are distributed only to capable storage providers and monitor them hourly for reachability.
We work with Singularity which is still under development and sometimes suffer from duplicate deals constructed by this tool. This is mainly because our scale of 100-500 TiB daily packing can sometimes cause files to be processed twice when the data is changed in the buckets (for example the date of the file). In addition our selected storage providers are focused on reviewing data before processing and we recently started with regular duplicate checks on all of our available pools to ensure that duplicates are kept to an absolute minimum.
I expect to have sufficiently informed the community with this information. If there are any further questions, please let me know. We are in favor of full transparency and openness. Together we will move Filecoin forward and build the future of WEB3.
To make this clear, you are a storage provider but also platform?
This seems very vague to me and just a way to grow your miner with the FIL+ multiplier.
I do not support this kind of request in any way and would advise notaries not to sign.
I find it very childish of the applicant in question (@chuangyuhudong) to respond in this way, as it is a copy of a response from me to one of his own datacap requests, which is doubtful to be following the Filecoin+ ecosystem rules.
Is this the level we have descended to?
I would love to have an adult level discussion about the misused datacap for fraudulent practices or completely useless files. The Filecoin+ program is meant to advance the entire Filecoin ecosystem by increasing the value of data storage, it is now mostly used for personal gain and abusive growth with spoofed files or fake data.
It has to stop, 99% down from our all-time-high is enough. @raghavrmadya @cryptowhizzard
REQUEST MOVED TO: #1550
Closing due to new application created.
Data Owner Name
Massachusetts Institute Of Technology
Data Owner Country/Region
United States
Data Owner Industry
Life Science / Healthcare
Website
https://www.dandiarchive.org/
Social Media
Total amount of DataCap being requested
5PiB
Weekly allocation of DataCap requested
1PiB
On-chain address for first allocation
f1xlnnzzl2d73amahvsu5uyfwdascodwtw6lopw5y
Custom multisig
Identifier
No response
Share a brief history of your project and organization
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
No response
Describe the data being stored onto Filecoin
Where was the data currently stored in this dataset sourced from
AWS Cloud
If you answered "Other" in the previous question, enter the details here
No response
How do you plan to prepare the dataset
IPFS, lotus, singularity, others/custom tool
If you answered "other/custom tool" in the previous question, enter the details here
No response
Please share a sample of the data
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
No response
What is the expected retrieval frequency for this data
Yearly
For how long do you plan to keep this dataset stored on Filecoin
1.5 to 2 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, North America, South America, Europe, Australia (continent)
How will you be distributing your data to storage providers
HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer
How do you plan to choose storage providers
Slack, Big data exchange, Partners
If you answered "Others" in the previous question, what is the tool or platform you plan to use
No response
If you already have a list of storage providers to work with, fill out their names and provider IDs below
How do you plan to make deals to your storage providers
Boost client, Lotus client, Singularity
If you answered "Others/custom tool" in the previous question, enter the details here
No response
Can you confirm that you will follow the Fil+ guideline
Yes