filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] <AIOCP> - <GPU Cloud computing & Cloud Storage> #1284

Open aiocp opened 1 year ago

aiocp commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

AIOCP, established in 2012, started offering network equipment such as GPU, servers, switch and storage and extended the business fields to cloud computing services, particularly in one-on-one B2B consulting of network infrastructure. 

Now, we are about to release a brand-new cloud computing service called “BIGBANGCLOUD” which can use GPU resources in a virtual environment. Building its infrastructure using IPFS technology is a strength unlike typical cloud services.

The resources of basic cloud services currently offered are deployed in an on-premise environment. However, we realized decentralization and high availability are the core values in the era of web 3.0. It has been over two years since we invested in IPFS development. And now is the time to get started. 

BIGBANGCLOUD will be the right choice for start-up, medical/educational institutions that run 4th industrial revolution technologies such as autonomous driving, mobility, big data and AI. 

aiocp field aiocp press release

What is the primary source of funding for this project?

Business income

What other projects/ecosystem stakeholders is this project associated with?

None.

Use-case details

Describe the data being stored onto Filecoin

Currently, we are offering cloud computing as one of our main services, and are about to release an innovative hosting service using GPU resources. In the meantime, we were looking for a virtual storage that can store and retrieve our GPU hosting services. The answer is building infrastructure based on IPFS technology. 

The client who uses cloud computing services requires a space to store the data. It means “Cloud Storage”. 
The data we are planning to put into the filecoin system is mostly our client’s public data, and also R&D data and container images from our own resources. 

The R&D data refers to the test result or video from developing GPU cloud hosting. and the container images refer to tools that installed tensorflow, PyTorch, Keras, CuDNN would provide to the client. 

Where was the data in this dataset sourced from?

Mostly our client’s public data running in cloud services, and the container image using open sources made by our R&D team.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://drive.google.com/drive/folders/1MPFQo2FGqOS6AuQGkDvs0Uo1-patTOTd?usp=share_link

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes. 

What is the expected retrieval frequency for this data?

Whenever our clients are provisioning new services and storing some applications or their own data. 

For how long do you plan to keep this dataset stored on Filecoin?

540 days at least. 

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Prefer regions in Asia

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Use both online and offline transfer upon SP's request. 

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Please answer here.

How will you be distributing deals across storage providers?

Github and slack can help us find more sp with reputation and enough resource. We'd like to contact sps from different regions for distributed storage.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

yes. 
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

2PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

psh0691 commented 1 year ago

To sign the DC allocation, I would like to ask you a few questions first.

  1. It is difficult to verify 5PiB with sample data. Please provide the basis for applying for 5PiB. Example) Screenshot of the data size you have.
  2. Are customer data stored in the cloud publicly available?
  3. Do you own a Filecoin node? How would you respond to the point of self-dealing in customer validation?
aiocp commented 1 year ago

Hello, @psh0691

  1. We have 400TB for one copy. I would like to make 5 copies and distribute one copy to 5 SPs. So total 2Pib. I have just revised the application.
  2. Yes. They are all public data.
  3. No, we do not have a Filecoin node.
aiocp commented 1 year ago

@simonkim0515 I have revised the total amount of Datacap request from 5Pib to 2Pib. It was already approved two weeks ago. Can you please check and pull the trigger again? @raghavrmadya @galen-mcandrew @Kevin-FF-USA

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

DataCap allocation requested

50TiB

Id

097bf9c9-d573-4283-b7d0-cad7c834bc5f

psh0691 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedgduadxfy34alg2zlw67qxiviavjn6w5aewcwjop7jl34wfhic4i

Address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

Datacap Allocated

50.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

097bf9c9-d573-4283-b7d0-cad7c834bc5f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedgduadxfy34alg2zlw67qxiviavjn6w5aewcwjop7jl34wfhic4i

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedvmrrrkyokz3nc7sdxwzxc3sg356ixfnjhypedrqcmfo5yrjwi5s

Address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

Datacap Allocated

50.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

097bf9c9-d573-4283-b7d0-cad7c834bc5f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedvmrrrkyokz3nc7sdxwzxc3sg356ixfnjhypedrqcmfo5yrjwi5s

IreneYoung commented 1 year ago

@aiocp

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others. Please answer here.

1.Please answer the question above.

I would like to make 5 copies and distribute one copy to 5 SPs.

2.Seems like you intend to make deal with 5 SPs, can you list SPs you have contacted at present?

aiocp commented 1 year ago

Hi, @IreneYoung

  1. We are planning to choose the SPs can meet our requirement, having stable network connection specifically over 10G port line and storage capacity in Asia. We think that this conditions would make the tons of data to be stably retrievable in the future. We also plan to select SPs that are interested in the retrieval market, which is expected to grow in 2023.

If so, we participated some community and network events in Seoul hosted by Protocol Laps and met SPs discussing the filecoin road-map. We are still deciding proper SPs which can support us.

  1. We found three 3 SPs which can meet our requirement. but still deciding 2 SPs among our own SP list that we have contacted.
large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

DataCap allocation requested

100TiB

Id

c159d966-9f96-49b1-a104-6951c9a62a40

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

Last two approvers

kernelogic & psh0691

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

100TiB

Total DataCap granted for client so far

50TiB

Datacap to be granted to reach the total amount requested by the client (2 PiB)

1.95PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
3378 3 50TiB 42.12 12.24TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01956198new Seoul, Seoul, KR
EHOSTICT
15.20 TiB 44.47% 15.18 TiB 0.10%
f01873489new Seoul, Seoul, KR
EHOSTICT
14.30 TiB 41.86% 14.24 TiB 0.44%
f0521569 Seoul, Seoul, KR
Korea Telecom
4.67 TiB 13.67% 4.63 TiB 0.84%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
17.62 TiB 17.65 TiB 1 51.65%
8.22 TiB 16.52 TiB 2 48.35%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

Hello,

I see that you have only stored in one location. What is the reason that the rules of Fil+ are not followed?

aiocp commented 1 year ago

Hi, @cryptowhizzard The main reason we stored only in one location is the delivery time of the data.

We have lots of data to store for now as the first transaction and we need it to do as fastest as we can. If so, we decided to choose the SP that is located nearby from us. Also, I prefer had business meetings with SP in person so that I can trust how they run their minor node. Met them in the SP community.

The shown locations are only one as Seoul, but we didn't allocate it in one company. They are located in different regions in Seoul. To follow Fil+ rules, I have allocated it to two china SP too. I think it hasn't updated to CID checker.

herrehesse commented 1 year ago

@aiocp - You can not "decided to choose the SP that is located nearby from us" when the FIL+ rules clearly state to distribute amongst multiple regions and SP's. When do you expect to start following the rules of FIL+ ?

aiocp commented 1 year ago

@herrehesse The moment CID Checker report came, only had 3SPs in Korea. but we added 2 china SP to follow the rules. please check the details in below url. https://filplus.d.interplanetary.one/clients/f01936354/breakdown So, I think I did follow the rules so far.

Delivering data abroad is taking way too long, so I preferred Seoul. but apperantly storing in one city should be avoided (I got some advices from slack channel)

My customers are mostly in Asia. I should think about the download time from the client's perspective. But to follow the allocation rules, I would find more SPs in Asia.

Would it be okay?

herrehesse commented 1 year ago

I am not supportive of reducing the regional distributions to china (Asia) only. I will support if you are able to spread the data to EU/USA too.

Storage prices are negative at this point in time I am sure you can find ways to distribution.

All good applications do this. And it’s (in my opinion) needed to grant datacap.

GaryGJG commented 1 year ago

If you can show details about more SPs and the plan for cooperation with the them, I could consider to sign next batch DC for you, thanks.

aiocp commented 1 year ago

@herrehesse @GaryGJG SG(f01777785), JP(f01153105), AUS(f01777777), US(f0717969), KR(f01956198,f01873489) I have found them in Filgram. planning to have 6SPs for the next batch.

cryptowhizzard commented 1 year ago

Hi,

I am sorry. f01777785 , f01153105, f0717969 are involved in Abuse. If you want me to sign i won't let you store on these.

f01777777, f01956198 and f01873489 are ok.

Can you find some SP with standing reputation in the US? GreaterHeat might be an option for you or PikNik.

aiocp commented 1 year ago

@cryptowhizzard oh. I didn't know that they are involved in Abuse. How do I know? Searching their minor address in slack?

Anyway, I found another two f01971600, f01992630 from GreaterHeat in USA. I have discussed with them in Slack.

So, it will be these 5 SP for the second batch if possible. f01777777, f01956198, f01873489, f01971600, f01992630

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebp5bcf74ql7dpqog4qddm5gp45yfihajulbm3g7776qjkosaibwe

Address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

Datacap Allocated

100.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

c159d966-9f96-49b1-a104-6951c9a62a40

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebp5bcf74ql7dpqog4qddm5gp45yfihajulbm3g7776qjkosaibwe

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebx54prcfves557p2clyuf7ely5ph2lpj33q354hvbutzv2pgio3c

Address

f1dj23xokyovdqnbgx3nis3ygk73szanzlwb2kduy

Datacap Allocated

100.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

c159d966-9f96-49b1-a104-6951c9a62a40

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebx54prcfves557p2clyuf7ely5ph2lpj33q354hvbutzv2pgio3c

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

aiocp commented 1 year ago

Kepp this LDN open. It's in progress.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

aiocp commented 1 year ago

Keep this open. thanks

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

aiocp commented 1 year ago

Please keep this open. Thanks

zcfil commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 80.71% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

zcfil commented 1 year ago

Hi @aiocp , There are too few storage providers and the retrieval success rate is low. May I ask if you have found a new SP partner? Please list their ID and location

cryptowhizzard commented 1 year ago

@zcfil

Retrieval is not optimal but from what i could see this client is indeed storing cloud computing materials and images scanning through the partial files i received.

Based on my dash I see decent distribution rates also? Have i missed something?

aiocp commented 1 year ago

@zcfil @cryptowhizzard Adding the current situation for this LDN. It's in progress, but I'm creating the agreement in a document and only looking for SPs that are suitable with it. That's why it takes a while.

zcfil commented 1 year ago

Okay, we hope you can act according to the standards

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

aiocp commented 1 year ago

Please keep this open

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

aiocp commented 1 year ago

It's processing now. do not put the stale label.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

aiocp commented 1 year ago

Keep it open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

aiocp commented 1 year ago

OPEN PLEASE !!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

aiocp commented 1 year ago

Keep this LDN open.

Sunnyiscoming commented 1 year ago

Hello, @aiocp per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

aiocp commented 1 year ago

Hi @Sunnyiscoming, Just submitted! For the current status, it appears that most of the data is stored in Seoul, Korea. However, I'm planning to store the rest of my DC in different regions so that I won't violate the rule in the end.

herrehesse commented 1 year ago

checker:manualTrigger