Open Zzbaoo opened 4 days ago
Application is waiting for allocator review
@Zzbaoo Thank you for applying. In the case of open datasets, the data must be retrievable by anyone. Therefore, can you share a link to the entire dataset? The links you provided contain specific packages, and I want to check that everyone has access to all the data.
Also, please review the document below and consider the "Source Dataset" section. https://github.com/fidlabs/Open-Data-Pathway/wiki/Policies
Aside from the above, please answer:
@Zzbaoo, Could you also explain why you applied for DC from us when cooperating with another allocator? https://github.com/py-guazi/GZ-FIP0078-Pathway/issues/8 Moreover, this is the same dataset. If you wish to send more replicas to the filecoin, why not cooperate with the allocator who agreed to work with you?
I'll ignore that you have already applied with this dataset ten times, but since you are already working with another allocator, I think you should stick to them.
@Zzbaoo Thank you for applying. In the case of open datasets, the data must be retrievable by anyone. Therefore, can you share a link to the entire dataset? The links you provided contain specific packages, and I want to check that everyone has access to all the data.
Also, please review the document below and consider the "Source Dataset" section. https://github.com/fidlabs/Open-Data-Pathway/wiki/Policies
Aside from the above, please answer:
- When we sample your deals, how will we be able to confirm that it has come from the dataset?
- How the data is transformed into deals for filecoin?
- When a deal is sampled for verification, how will we be able to confirm that it is part of this dataset? (how is it chunked into car files?)
Yes, the complete link is as follows: https://www.qualcomm.com/developer/software/something-something-v-2-dataset/downloads. We use data from open datasets on compliant websites, ensuring that everyone has access to the data. The data has been stored on our server's hard drive, and if needed, I can upload some of the video data to a cloud storage for the reviewer to assess.
For each packaged .car file we generate, we record the dataset name of the source data file, as well as the piece_cid, payload_size, and other information in a CSV file. This ensures the complete preservation of the dataset and keeps track of which SP packaged which portion of the dataset, avoiding duplicate packaging.
The data we process is contained in multiple source files. To convert this data into Filecoin transactions, we download the source data files in a specific order and package them into .car files. Each .car file is approximately 17GB in size. If the size of a source data file is close to or exceeds 32GB, we split it into two parts and record the position of each part within the .car file.
@Zzbaoo, Could you also explain why you applied for DC from us when cooperating with another allocator? py-guazi/GZ-FIP0078-Pathway#8 Moreover, this is the same dataset. If you wish to send more replicas to the filecoin, why not cooperate with the allocator who agreed to work with you?
I'll ignore that you have already applied with this dataset ten times, but since you are already working with another allocator, I think you should stick to them.
The reason I applied for DC is that this time I am using an open dataset, which is different from the private dataset I collaborated on with other allocators. This data is directly downloaded from compliant open dataset websites. The private dataset involves specific restrictions and requirements, and the previously applied DC quota is nearly exhausted. I believe that seeking additional DC support, based on my existing hardware resources, is reasonable to maximize the utility of the data in the Filecoin network. Thank you for your understanding, and I look forward to your response.
Data Owner Name
Cloud age
Data Owner Country/Region
China
Data Owner Industry
Life Science / Healthcare
Website
http://www.gdysd.cn/
Social Media Handle
huangpijing@gdysd.cn
Social Media Type
Other
What is your role related to the dataset
Data Preparer
Total amount of DataCap being requested
3PiB
Expected size of single dataset (one copy)
350TiB
Number of replicas to store
8
Weekly allocation of DataCap requested
500TiB
On-chain address for first allocation
f1lxkktcq45quwmfhcdpclbisqpq7r35tiakn42sa
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
No response
Share a brief history of your project and organization
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
No response
Describe the data being stored onto Filecoin
Where was the data currently stored in this dataset sourced from
My Own Storage Infra
If you answered "Other" in the previous question, enter the details here
No response
If you are a data preparer. What is your location (Country/Region)
China
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
If you are not preparing the data, who will prepare the data? (Provide name and business)
No response
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.
No response
Please share a sample of the data
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
No response
What is the expected retrieval frequency for this data
Daily
For how long do you plan to keep this dataset stored on Filecoin
More than 3 years
In which geographies do you plan on making storage deals
Asia other than Greater China, North America
How will you be distributing your data to storage providers
Cloud storage (i.e. S3), Shipping hard drives, Venus built-in data transfer
How did you find your storage providers
Partners
If you answered "Others" in the previous question, what is the tool or platform you used
No response
Please list the provider IDs and location of the storage providers you will be working with.
How do you plan to make deals to your storage providers
Boost client, Lotus client, Droplet client
If you answered "Others/custom tool" in the previous question, enter the details here
No response
Can you confirm that you will follow the Fil+ guideline
Yes