filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Fujian DongShuXiSuan Technology Co., Ltd. - PPDX #996

Closed dsxsfz closed 1 year ago

dsxsfz commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Our company is a company specializing in providing big data and AI solutions. Our main business includes: data storage, big data analysis and calculation, AI model training.
During R&D and testing, we collected a large number of public datasets.

What is the primary source of funding for this project?

This relies on the support of our shareholders and company income.

What other projects/ecosystem stakeholders is this project associated with?

None

Use-case details

Describe the data being stored onto Filecoin

During R&D and testing, we have downloaded a large number of public data sets. We use these data for daily research and development and AI data training. We are very grateful for the great value these data bring to the company's development. We are concerned about the rapid development of the filecoin project, and technologies such as distributed storage, fvm, and data retrieval are also of great interest to the company. So we decided to store these obtained public datasets and the publicly available datasets generated by the company to the filecoin network through the fil-plus project.

Where was the data in this dataset sourced from?

Our data comes from kaggle, aws opendata, public data download platforms of major universities, as well as our daily training data, etc.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/70-dog-breedsimage-data-set.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/butterfly-images40-species.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/hard-drive-test-data-q4-2018.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/lgg-mri-segmentation.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/numerai-train-validation-with-kazutsugi-nomi.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/stanford-dogs-dataset-traintest.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/tensorflow-flowers.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/tf-roberta.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/training-car.zip
https://dsxs-public.oss-cn-fuzhou.aliyuncs.com/unetr-model.zip

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes. I confirm that this is a public dataset that can be retrieved by anyone on the Network.

What is the expected retrieval frequency for this data?

About 10 times a year.

For how long do you plan to keep this dataset stored on Filecoin?

At least 1.5 year

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Asia, North America.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Yes. We will adopt online transmission and offline transmission.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will look for SPs through slack and filfox, hope to choose miners with good credit. This is the character we value.

How will you be distributing deals across storage providers?

We will follow the rules for large-datasets. And we will ensure fair distribution through limiting the amount of deals send to each miner, under 25% to per sp. These records will be made and published regularly on Github.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have. We'd like community can support and approve our issue.
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

raghavrmadya commented 1 year ago

It seems like you are looking to store multiple public datasets using a single LDN application. We require you to open a seperate application for each dataset for community due diligence and notary diligence.

dsxsfz commented 1 year ago

@raghavrmadya Hi RG, We downloaded different public datasets from many sources, most of these datasets are very small and also contain newly generated data during our own research and development. I don't think it's appropriate to apply LDN separately for each public dataset. So please help us continue with this issue, and we can provide anything you need to diligence. Thanks

cryptowhizzard commented 1 year ago

Dear applicant,

Thank you for applying for datacap. As Filecoin FIL+ notary i am screening your application and conducting due diligence.

Looking at your application i have some questions: As you are brand new on Github and have no history of past applications it seems to me that applying for 5PB of datacap is a lot. One needs comprehensive knowledge of Filecoin, packing of data, distribution of data and all it's requirements coming with it. Are you brand new in the Filecoin space or have you applied for datacap in the past on different Github account names?

Can you show us visible proof of the size of your data and the storage systems you have there?

As last question i would like you to fill out this form to provide us with the necessary information to make a educated decision on your LDN request if we would like to support it.

Thanks!

large-datacap-requests[bot] commented 9 months ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided We could not find Data Type of Application field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 6 months ago

RootKeyHolders have approved multisig account. You can now request first datacap release