filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] <CoverMe Communications> - <GoDap> #1248

Closed jenniferAzhou closed 10 months ago

jenniferAzhou commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

CoverMe, founded in the United States in 2013, is an online service provider that provides mobile Internet based IM, file distribution, CDN and other services to users around the world, and has served more than tens of millions of users.

Godap is one of several products of CoverMe. Godap provides our users with an unlimited number of file sharing services, which are mainly used for enterprise/individual collaborative work scenarios, as well as in the field of acquaintance social networking.

What is the primary source of funding for this project?

Company.

What other projects/ecosystem stakeholders is this project associated with?

None.

Use-case details

Describe the data being stored onto Filecoin

In Godap's business scenario, customers will create a large amount of content information in pdf/word, xmind/visio, videos, recording, whiteboard pictures, work and life photos, etc. 
Godap should help customers store these original content. We expect a huge amount of storage demand in the future, so we have been looking for safe, reliable and affordable storage service providers. 
IPFS & Filecoin is our best choice now.

Where was the data in this dataset sourced from?

All data from the company.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

The files stored in Filecoin are a collection of files uploaded to the Godap platform by users in the past. The files include videos, photos, screenshots, and recordings uploaded by users, which may be an encrypted compressed file. At present, the main purpose of our storage in Filecoin is for data disaster recovery. In the future, if Filecoin can support small file retrieval, we will upload user original files without packaging.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

No.

What is the expected retrieval frequency for this data?

Once a year, for disaster recovery purpose.

For how long do you plan to keep this dataset stored on Filecoin?

Permanently.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Prefer North America, but the globe is also acceptable.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

online & off-line.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We require SP to provide proof of reliable and continuous storage. We require SP to have actual deal storage. We plan to store the data in multiple SPs worldwide. We have contacted the following SPs:

TopBlocks (USA)
Dcent @ Hidde Hoogland (Europe)
Cabrina (China)
Feiyan (Canada).

How will you be distributing deals across storage providers?

We will distribute the deal to each SP, and we will develop a Datacap allocation scheme to meet the E-Fil project's requirements for allocation strategies and our requirements for data security. In addition, we learned that there is a Lead SP role in the E-Fil project. We hope to find a North American SP as the Lead SP, such as TopBlocks as the Lead SP to take over our Datacap..

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, the funds for storage are ready.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

kevzak commented 1 year ago

@jenniferAzhou since this is a private dataset, please complete the exceptions proposal also to provide more details about the data storage plan. Thanks

jenniferAzhou commented 1 year ago

@kevzak i have finished the exceptions proposal, please check: https://github.com/filecoin-project/notary-governance/issues/781

kevzak commented 1 year ago

Hello - I can confirm for the Fil+ team that @jenniferAzhou has completed the Client Registration and Business Verification Check for CoverMe Communications, Inc. successfully.

I will ask Trust and Transparency team @raghavrmadya @Sunnyiscoming to take a look at this application. Can you look into their godap product? www.godap.com has no links, no app I can find. I'm curious about the customers and where the data is.

I asked a question on the exceptions proposal: https://github.com/filecoin-project/notary-governance/issues/781#issuecomment-1318666908

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

kevzak commented 1 year ago

Hello - I can confirm for the Fil+ team that @jenniferAzhou has completed the Client Registration and Business Verification Check for CoverMe Communications, Inc. successfully. Additionally, they have updated their exception proposal with missing SP location and ID information and have reduced the amount of DataCap initially being requested to match the actual amount of Data they will onboard as part of a proof of concept. I will trigger the application for notary review.

raghavrmadya commented 1 year ago

Thank you @kevzak and @jenniferAzhou for the information and transparency thus far. I've triggered the application. Notaries are still expected to conduct further diligence as they deem useful before signing.

raghavrmadya commented 1 year ago

Datacap Request Trigger

Total DataCap requested

2PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01940930

Client address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

DataCap allocation requested

50TiB

Id

068c8f63-ba6d-4c5c-b7b6-6e0f6bb7dfbb

large-datacap-requests[bot] commented 1 year ago

Hello @1475Notary - @swatchliu , please sign the datacap request

Destore2023 commented 1 year ago

Glad to see that Efil+ draws more enterprise private data to expand the capacity of the network. @jenniferAzhou, we would like to know more application details such as data samples.

1475Notary commented 1 year ago

Please share the specific data composition, encapsulation and allocation plan.

kevzak commented 1 year ago

@1475Notary please see exception proposal for more details on allocation plan. https://github.com/filecoin-project/notary-governance/issues/781#issuecomment-1326240369

kevzak commented 1 year ago

@jenniferAzhou let me know if you need help coordinating to show data samples to @swatchliu and @1475Notary

jenniferAzhou commented 1 year ago

Sorry for the late reply. I have uploaded a sample data, please review. This sample data is the user behavior record data of our business, and certainly does not contain any sensitive information. The data uploaded to Filecoin will be a collection of these data.

download sample data

1475Notary commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb45eibxt5xqceut3ef7vflbqdo4c2xzfevybt6snxg7yp36b3ygi

Address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

Datacap Allocated

50.00TiB

Signer Address

f1ofq4mngy7ggcp755pfquq2gphjjnlydolf6awtq

Id

068c8f63-ba6d-4c5c-b7b6-6e0f6bb7dfbb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb45eibxt5xqceut3ef7vflbqdo4c2xzfevybt6snxg7yp36b3ygi

kevzak commented 1 year ago

Sorry for the late reply. I have uploaded a sample data, please review. This sample data is the user behavior record data of our business, and certainly does not contain any sensitive information. The data uploaded to Filecoin will be a collection of these data.

download sample data

@swatchliu let us know what you think when you take a look

kevzak commented 1 year ago

Datacap Request Trigger

Total DataCap requested

2PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01940930

Client address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

DataCap allocation requested

50TiB

Id

17f06cb1-b180-4272-b51e-736cae196cf6

large-datacap-requests[bot] commented 1 year ago

Hello @newwebgroup - @Fatman13 , please sign the datacap request

Fatman13 commented 1 year ago

Hello, LDN applicant, I am newly assigned to this application here and could you please send relevant KYB document to me either on Slack (to @Fatman13) or through Email (yu.leng@guazi.io). Thank you!

newwebgroup commented 1 year ago

Hello, LDN applicant, I am newly assigned to this application here and could you please send relevant KYB document to me either on Slack to "Yuan" or through Email (yuan@newwebgroup.com). Thank you!

kevzak commented 1 year ago

@Fatman13 @newwebgroup Coverme has completed a KYB check via the Diro.io product. I have already confimred this, see comment above https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1248#issuecomment-1326291704

The goal for the E-Fil+ pilot is for notaries not to have to review KYB, but to look at the application, data sample, and data storage plan. Let me know if that makes sense.

newwebgroup commented 1 year ago

Hey @jenniferAzhou 1.About Data Samples

Because I am not a developer, I have checked your data sample description and sample data, and I do not see much relevance. (If my judgment is wrong, please correct me.) Please explain the relationship between the data sample description and the sample data. Many thanks.

image

2.About SPs

I contacted feiyan and Cabrina, who knew and applied for this fil-E as SP

jenniferAzhou commented 1 year ago

Hi @newwebgroup ,

No problem, let me explain. As I explained in the exception proposal, we have multiple types of businesses in progress, and each business will generate a lot of data. We want to find a better storage service provider. Godap business will be our largest data business in the future, so I used the Godap project to describe the data details at first, but Godap has not been launched yet, and there is no official data. So then I used another online business data as the sample data, which is the actual private data, which records the behavior information of some of our users, and certainly does not contain any policy sensitive information, I think it can better explain the authenticity of our data to notaries. I will first upload the online business data to Filecoin, and after Godap goes online, both business data will be uploaded. I have explained the size of the dataset to be uploaded in detail in the exception proposal.

newwebgroup commented 1 year ago

OK, I understand. @jenniferAzhou How much data do you currently load into Filecoin?

jenniferAzhou commented 1 year ago

@newwebgroup we haven't uploaded data to Filecoin yet. Our existing 200T data is in the local database, and there are several T new data every day. Due to the storage cost, we have not kept the historical data every day, which is one of the reasons why we are willing to upload to Filecoin to save money. We have started to prepare the deployment of data security and network bandwidth for uploading data to Filecoin.

Fatman13 commented 1 year ago

I have verified sample data and distribution plan through private email exchange with Jennifer zhou and they both LGTM.

kevzak commented 1 year ago

OK great. So @1475Notary @Fatman13 @newwebgroup if you approve this application can you please sign?

Fatman13 commented 1 year ago

Will do once I get hold of my Nano tonight. 🤝

kevzak commented 1 year ago

@Fatman13 any change to approve this yet? Thanks

Fatman13 commented 1 year ago

Hello, @kevzak, somehow I got the following error when trying to sign this LDN. Could the team please help take a look at this error? Got the error prompt after clicking approving request.

image

fabriziogianni7 commented 1 year ago

@Fatman13 hello, I saw you got a new nano. what version of filecoin app have you got in the ledger?

Fatman13 commented 1 year ago

@fabriziogianni7 No, I was using the same nano which I used for signing https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/928.

fabriziogianni7 commented 1 year ago

can you please check the version anyway?

Fatman13 commented 1 year ago

@fabriziogianni7 Sorry left my Nano at home. Will check once I get back. Thank you for taking a look!

jenniferAzhou commented 1 year ago

@Fatman13 @kevzak is there any update? we are almost ready to upload data.

Fatman13 commented 1 year ago

@fabriziogianni7 sorry for the late reply, filecoin app version is v0.22.2.

@jenniferAzhou Sorry for the delay, I have tried to log into the notary dashboard multiple times this morning and its either stuck at login or dashboard just displays nothing at all.

fabriziogianni7 commented 1 year ago

Hey fatman, please update the ledger filecoin app to 0.22.5. That should be enough

Fatman13 commented 1 year ago

@fabriziogianni7 Got it! Will try! thx!

Fatman13 commented 1 year ago

Hello, @fabriziogianni7 @kevzak sorry for the late reply again. After trying internet connection from couple different places with couple different VPN providers, I am not quite able to connect to "my ledger" page to get my app upgraded. I see that some notaries in China were experiencing the same issue. Will keep trying but maybe we need a plan B for this client to start data on-boarding?

image

newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedqdeltev6tkq234x22w2vdopjqkn6oxx2s6z7gb7r3ge5h3xqudk

Address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

Datacap Allocated

50.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

17f06cb1-b180-4272-b51e-736cae196cf6

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedqdeltev6tkq234x22w2vdopjqkn6oxx2s6z7gb7r3ge5h3xqudk

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01940930

Client address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

DataCap allocation requested

200TiB

Id

30ebf8e8-72da-4a7a-8083-ad9831653b8b

large-datacap-requests[bot] commented 1 year ago

Hello @IPFSUnion - @liyunzhi-666 , please sign the datacap request

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01940930

Client address

f1m7kvgdyq5ej7uqs63yx7es66vp2gjb2iqi2kdly

Last two approvers

newwebgroup & 1475Notary

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

200TiB

Total DataCap granted for client so far

50TiB

Datacap to be granted to reach the total amount requested by the client (2 PiB)

1.95PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
639 3 50TiB 44.11 11.04TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f0143858 Clifton, New Jersey, US
DigitalOcean, LLC
9.04 TiB 34.64% 9.04 TiB 0.00%
f01969779 Clifton, New Jersey, US
DigitalOcean, LLC
8.89 TiB 34.07% 8.89 TiB 0.00%
f02301 San Jose, California, US
Krypt Technologies
8.16 TiB 31.29% 8.16 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
3.41 TiB 3.41 TiB 1 13.09%
3.60 TiB 7.19 TiB 2 27.57%
5.16 TiB 15.48 TiB 3 59.34%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01969779 Clifton, New Jersey, US
DigitalOcean, LLC
16.81 TiB 36.99% 16.81 TiB 0.00%
f0143858 Clifton, New Jersey, US
DigitalOcean, LLC
14.94 TiB 32.87% 14.94 TiB 0.00%
f02301 San Jose, California, US
Krypt Technologies
13.70 TiB 30.14% 13.70 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
6.16 TiB 6.16 TiB 1 13.54%
5.04 TiB 10.07 TiB 2 22.16%
9.74 TiB 29.23 TiB 3 64.29%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger