filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] OSCCHINA #47

Closed galen-mcandrew closed 1 year ago

galen-mcandrew commented 3 years ago

Large Dataset Notary Application

To apply for a DataCap allocation for your dataset, please fill out the following information.

Core Information

Project details

Share a brief history of your project and organization.

OSSCHINA is a one-stop open-source technology company. The flagship product, OSCHINA, is the largest open-source technology community in China. It was founded back in 2008 and now serve more than 3 million users. Gitee, the code hosting and collaborative development platform we built, taking honor of being the second largest source code hosting platform after Github, is currently host more than 6 million developers and more than 15 million hosting projects. It bring together almost all original open source projects in mainland China. Gitee launched the enterprise version in 2016, providing enterprise-level code hosting services, and becoming a leading SaaS service provider in the development field.

What is the primary source of funding for this project?

Gitee is a project to host both open source project and enterprise private project. We do not charge for open source project, which has been funded by the parent company’s capital injection and corporate service fee. 

What other projects/ecosystem stakeholders is this project associated with?

The primary stakeholder is the FileStations, who has committed to provide S3 Gateway for Filecoin.

Use-case details

Describe the data being stored onto Filecoin

We plan to upload the archived copy of the historical code of the hosted open-source project to Filecoin network. At present, the size of archived code is 200TB. We plan to upload five copies for redundancy. The full list of datasets can be found here: https://gitee.com/explore.

Where was the data in this dataset sourced from?

Each of these datasets come from historical code of the open-source project hosted on the Gitee . All data are publicly available. 

Can you share a sample of what is in the dataset? A link to a file, an image, a table, etc., are good examples of this.

Here are links to a few of the projects:
-[Huawei Hongmeng Operating System] (https://gitee.com/openharmony)
-[Baidu Apollo Autonomous Driving Platform] (https://gitee.com/ApolloAuto)

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

These are all public datasets that can be retrieved by anyone on the network through the Gitee website or through the Git protocol.

What is the expected retrieval frequency for this data?

We expect that retrieval frequency to be sporadic - some users may choose to retrieve this data to backtrack and detect bugs, some may retrieve out of synchronizing local copies-the retrieval frequency really depends on public desire. Since the retrieval request may occur at any time, in the future, we will work with the FileStations to cache some hot data based on user retrieval frequency to reduce the cost of Filecoin decoding.

For how long do you plan to keep this dataset stored on Filecoin? Is this a permanent archival or a temporary storage deal?

We intend to store the dataset for a minimum 18 months duration, and we are intending to enable renewals in the future.

DataCap allocation plan

In which geographies do you plan on making storage deals?

The userbase of Gitee platform is mainly located in China. Considering availability, we will store most of the copies (at least 3 copies) in China, and then at least one copy will be placed in Hong Kong or overseas, ensuring the efficiency and diversity.

What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?

The size of archived copy is 200 TB. The DataCap applied is mainly used to upload these data into the Filecoin. The upload speed is decided by the performance of the miners' cluster. We might import data to miners by offline deals
, which is expected to take 20 days to import a whole copy. Adding up the time of data copying and shipping the drive, it will take around 2 months for all five data copies to be uploaded to Filecoin.

How will you be distributing your data to miners? Is there an offline data transfer process?

Given the scale of the dataset, we plan to use an offline data transfer process. We will use hard disk cabinets to copy the data and transport them to miners for storage.
In the future, the incremental archived data will be transmitted online using S3 gateway. The S3 gateway will be provided by the FileStations team and deployed in the IDC of the miners.

How do you plan on choosing the miners with whom you will be making deals?

The miners we work with must meet several conditions:
-The IDC should be T3 level or above, and not less than 3 years lease with the IDC is required.
-Public network bandwidth should be 1GB/s or above, with a fixed IP address, and all miners must ensure at least one line of China Telecom, China Unicom, China Mobile and overseas.
-Willingness to provide us with additional machines to build the FileStations S3 gateway.

How will you be ensuring fair distribution of storage and DataCap across miners storing data?

We plan to work with 5 to 10 miners, and allocate copies according to the miners conditions. The better the IDC level and bandwidth, the more data and DataCaps will be allocated. Each miner can store at most one copy of the complete data.
large-datacap-requests[bot] commented 3 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

galen-mcandrew commented 3 years ago

@Gitee-Alex Here is the new large dataset application issue, per the new LDN process.

galen-mcandrew commented 3 years ago

Multisig Notary requested

Total DataCap requested

1PiB

Expected weekly DataCap usage rate

50TiB

large-datacap-requests[bot] commented 3 years ago

**Multisig created and sent to RKH f01322570

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01322570

Client address

f3qcpd4uybca3mvyovwjcqvvuikia3uqm5nte6lbs5dmtgz3udbohskmo3lmoasqjmtqc2qmsyko7sahfzi46q

DataCap allocation requested

25TiB

dannyob commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedrzbcy66oy4ukthucuduwexykl4i72yk2yrtbyq543a6yhvs3pyy

Address

f3qcpd4uybca3mvyovwjcqvvuikia3uqm5nte6lbs5dmtgz3udbohskmo3lmoasqjmtqc2qmsyko7sahfzi46q

Datacap Allocated

25TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedrzbcy66oy4ukthucuduwexykl4i72yk2yrtbyq543a6yhvs3pyy

Reiers commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebjbyx34oqmnmgry6s5vlc5fzvkyamrr6iidejlwzz7j5qqrme356

Address

f3qcpd4uybca3mvyovwjcqvvuikia3uqm5nte6lbs5dmtgz3udbohskmo3lmoasqjmtqc2qmsyko7sahfzi46q

Datacap Allocated

25TiB

Signer Address

f1oz43ckvmtxmmsfzqm6bpnemqlavz4ifyl524chq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebjbyx34oqmnmgry6s5vlc5fzvkyamrr6iidejlwzz7j5qqrme356

mr-spaghetti-code commented 2 years ago

Hi,

It looks like you received 50TiBs of DataCap to date but have spent less than 20% of it so far.

We would love to understand if there's anything holding you back. We are working hard to make the data onboarding process easier for clients like you and your feedback is very valuable. If you have a moment, please fill in this survey: https://forms.gle/s6AuTXZPZSMokscLA

If you have any feedback or would like to consult with an expert, please let me know.

Thanks,

João Fiadeiro Product Manager, Large Data Client Onboarding Protocol Labs

Sunnyiscoming commented 1 year ago

Hi, please explain the abnormal information.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!