filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Changsha Yingtong Information Technology Co - opfilm #1166

Closed zcfil closed 1 year ago

zcfil commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Changsha Yingtong Information Technology Co., Ltd. is a professional surgical teaching equipment manufacturer in China. With nearly 200 hospital user cases, it has become a well-known brand in the industry and the only manufacturer in the industry that develops and designs dedicated front-end host and receiving equipment for surgical teaching applications. All products have passed SGS medical safety certification and EMC electromagnetic compatibility certification.

The company leads the development of the industry with innovation, integrates advanced technologies such as intelligent imaging, media transmission, industrial control, and remote video, and has products and solutions such as digital operating rooms, surgical teaching systems, ICU medical systems, telemedicine, intelligent medicine, and big health image cloud.

What is the primary source of funding for this project?

Some clients and partners jointly contribute

What other projects/ecosystem stakeholders is this project associated with?

N.A. 

Use-case details

Describe the data being stored onto Filecoin

Surgical videos from hospitals all over the country for academic research

Where was the data in this dataset sourced from?

These data are from hospitals, which have a large number of surgical video data to store. A large amount of such surgical video data will be generated every day, which will be provided to doctors around the world for academic research and contribute to the medical progress of all mankind

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://drive.google.com/drive/folders/1i6eATk4q3hpzCU3LqScfOpQreuM15gT6

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, anyone can access the data

What is the expected retrieval frequency for this data?

First read about 5-10 times, and will be only read when the data in caches of relay nodes expired.

For how long do you plan to keep this dataset stored on Filecoin?

If possible, I hope it will last forever. If not, the longer the better

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Greater China, or around the world

How will you be distributing your data to storage providers? Is there an offline data transfer process?

We have our own IDC machine room, which can provide network bandwidth download, or mail hard disks to storage providers

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will take the initiative to obtain SP. We also accept SP's initiative to contact us. Our data will be stored in multiple SPs to ensure distributed storage and data security. We periodically retrieve data to ensure that it is retrievable.

How will you be distributing deals across storage providers?

At least 5 SP

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

We participate in the filecoin project from the test network, the space race and SlingShot.
We have some nodes that have been used for a long time, such as f02528(23.18PiB). We also have some nodes that have high power, such as f01756683 (12PiB), f01859603(8.6PiB), f01877184 (8.7PiB),f0723827(14.29PiB).Over the past two years, we have obtained many block rewards and resources, and we have the ability to start trading immediately after obtaining DataCap
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

DataCap allocation requested

50TiB

Id

2cb0674a-2cbd-4a7f-8293-0b9a1c53eea8

kernelogic commented 1 year ago

Could you provide some data samples and prove it is over 500T of original data?

newwebgroup commented 1 year ago

Hey @zcfil According to the data samples provided, the data should belong to “厚凯医疗” ”长沙影通信息技术有限公司“It looks like a company providing technical services.

  1. Who does the Medical data belong to?
  2. Have you obtained the authorization to store these medical data on Filecoin?
image
zcfil commented 1 year ago

Hi.@newwebgroup 1.These data are recorded videos of academic research jointly conducted by major hospitals, which are open.And they are all our customers 2.They are all our customers' data, and with their consent, they are willing to back up these data to filecion for all medical practitioners to learn and contribute to medical progress around the world

zcfil commented 1 year ago

@kernelogic [https://drive.google.com/file/d/1Xdz9w1rsOCpe2bDV4IsmB1c3zb34J61R/view?usp=share_link](This is a data sample) image

This is the screenshot of our server data: 16703965568140 image

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceafxhdyyc4ndlsjzytaovyuf6lrswvlkyvcpm4urtbpdhkxcvbtdc

Address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

Datacap Allocated

50.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

2cb0674a-2cbd-4a7f-8293-0b9a1c53eea8

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceafxhdyyc4ndlsjzytaovyuf6lrswvlkyvcpm4urtbpdhkxcvbtdc

zcfil commented 1 year ago

@newwebgroup Could you approve it for us? Thank you very much

newwebgroup commented 1 year ago

1:Could you send an email to filplus-app-review@fil.org ? 2:Can you provide more detailed information about other storage providers participated in this program, such as you can list SPs you have contacted with at present?

zcfil commented 1 year ago

@newwebgroup We have sent some SPs we are contacting to your mailbox. If necessary, we can ask them to provide me with a private key signature to verify the authenticity

newwebgroup commented 1 year ago

Please provide a screenshot after sending the verification email to the governance team

zcfil commented 1 year ago

Hi,@newwebgroup This is the screenshot of the email I sent image

newwebgroup commented 1 year ago

@zcfil Hey , you need to show your work email suffix to complete KYB certification.

zcfil commented 1 year ago

@newwebgroup Hi,I have taken a screenshot again image

newwebgroup commented 1 year ago

Please use the enterprise mailbox with opfilm.cc as the suffix for verification.

newwebgroup commented 1 year ago

Any updates? @zcfil

zcfil commented 1 year ago

Hi @newwebgroup, I resend the message

16708289597939
newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedhk7zvfp5dff27cwsrbv5zjcid7q2h5equ4pw2tnvqyjadiwtkuy

Address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

Datacap Allocated

50.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

2cb0674a-2cbd-4a7f-8293-0b9a1c53eea8

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedhk7zvfp5dff27cwsrbv5zjcid7q2h5equ4pw2tnvqyjadiwtkuy

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01660795 has sealed 100.00% of total datacap.

⚠️ All storage providers are located in the same region.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01660795 Shenzhen, Guangdong, CN 32.00 GiB 100.00% 32.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
32.00 GiB 32.00 GiB 1 100.00%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

newwebgroup commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01836766 Shenzhen, Guangdong, CN
China Telecom (Group)
21.03 TiB 57.52% 21.03 TiB 0.00%
f01923787 Shenzhen, Guangdong, CN
China Telecom (Group)
15.50 TiB 42.39% 15.50 TiB 0.00%
f01660795 Shenzhen, Guangdong, CN
CHINANET-BACKBONE
32.00 GiB 0.09% 32.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ 99.74% of deals are for data replicated across less than 3 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
5.59 TiB 5.59 TiB 1 15.30%
15.44 TiB 30.88 TiB 2 84.44%
32.00 GiB 96.00 GiB 3 0.26%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

zcfil commented 1 year ago

@newwebgroup Dear notary, how do I continue to obtain DataCap?

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

DataCap allocation requested

100TiB

Id

755932c9-9980-4dd1-9cdf-9fd2099d0c9f

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

Last two approvers

newwebgroup & kernelogic

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

100TiB

Total DataCap granted for client so far

50TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.95PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
1215 3 50TiB 59.01 11.28TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01836766 Shenzhen, Guangdong, CN
China Telecom (Group)
21.47 TiB 58.02% 21.47 TiB 0.00%
f01923787 Shenzhen, Guangdong, CN
China Telecom (Group)
15.50 TiB 41.89% 15.50 TiB 0.00%
f01660795 Shenzhen, Guangdong, CN
CHINANET-BACKBONE
32.00 GiB 0.08% 32.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ 99.75% of deals are for data replicated across less than 3 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
6.03 TiB 6.03 TiB 1 16.30%
15.44 TiB 30.88 TiB 2 83.45%
32.00 GiB 96.00 GiB 3 0.25%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

herrehesse commented 1 year ago

Dear Applicant,

Due to the increased amount of erroneous/wrong Filecoin+ data recently, on behalf of the entire community, we feel compelled to go deeper into datacap requests. Hereby to ensure that the overall value of the Filecoin network and Filecoin+ program increases and is not abused.

Please answer the questions below as comprehensively as possible.

Customer data

We expect that for the onboarding of customers with the scale of an LDN there would have been at least multiple email and perhaps several chat conversations preceding it. A single email with an agreement does not qualify here.

Should this only be soley for acquiring datacap this is of course out of the question. The customer must have a legitimate reason for wanting to use the Filecoin+ program which is intended as a program to store useful and public datasets on the network.

(As an intermediate solution Filecoin offers the FIL-E program or the glif.io website for business datasets that do not meet the requirements for a Filecoin+ dataset)

Files and Processing

Hopefully you understand the caution the overall community has for onboarding the wrong data. We understand the increased need for Filecoin+, however, we must not allow the program to be misused. Everything depends on a valuable and useful network, let's do our best to make this happen. Together.

herrehesse commented 1 year ago

@zcfil can you explain why your website says "HELLO" in text only?

zcfil commented 1 year ago

@herrehesse Ouch, we replaced the official server deployment website two days ago, and did not notice this problem. We have restored it to normal. Thank you for your reminder.

herrehesse commented 1 year ago

@zcfil thanks for your quick answer! Can you also assist me with the above questionnaire?

newwebgroup commented 1 year ago

@zcfil You need to find two new notaries to sign for you to get the quota.

zcfil commented 1 year ago

1_副本 2_副本 3_副本

  1. We also have our own miners on the filecoin network, so we are very clear about the operation mode of filecoin
  2. As mentioned in the chat conversation, there is more than 1 PB of data
  3. Because our data has not been backed up, considering the risk of data loss, the filecoin network is distributed storage, so the security of data storage is extremely high
  4. Because customers have their own miners in the filecoin network, and they know that the filecoin network belongs to distributed storage, and the security of data storage is extremely high 5.We have found many storage providers in China. Because China's logistics industry is developed and fast, we consider offline transmission, and then through logistics or express delivery to the designated storage provider
  5. This is the LAN bandwidth of the storage provider. We have verified it before storage. It is the lowest 10G network environment 企业微信截图_16729117872418

7.The data set preparer will first filter the duplicate video data file names, and then use the md5sum command to verify them to prevent duplication and abuse of the data upper limit

@herrehesse This is my answer in the order of your questions. Please check

zcfil commented 1 year ago

Hi @cryptowhizzard , Some of our partners are engaged in filecion storage related businesses. We listened to their suggestions before storing data in filecion

cryptowhizzard commented 1 year ago

Hello @zcfil

Thanks. After translating the images above there are things more clear to me now. You are applying and the person in the hospital is your client. He is working there as i understand and you offered him to store his data on Filecoin.

From what i see above the first tranche is already allocated. The problem there is that it is not allocated according to the rules of FIL+. The SP's used are involved in LDN applications who are not compliant with the FIL+ rules. You can check here

Since this is the 3rd allocation, the following restrictions have been relaxed: Storage provider should not exceed 70% of total datacap. Storage provider should not be storing duplicate data for more than 20%. Storage provider should have published its public IP address. All storage providers should be located in different regions.

Can you let us know if there is still interest from your side to get this application on the right track? It means storing with SP's with a good reputation and, since there must be diversity in region, also outside China.

zcfil commented 1 year ago

Hello @cryptowhizzard

Thank you for your reply. I'm very interested in getting this application on track,And I have contacted high-quality storage suppliers outside China, such as Japan, Hong Kong and Singapore,I believe that we can distribute reasonably according to the rule of FIL+in the future.

cryptowhizzard commented 1 year ago

Thanks for that.

If you can give me a list of SP’s you will use for next allocation, with good reputation then i will sign asap

zcfil commented 1 year ago

Due to the limited quota, I may temporarily allocate it to the following nodes f01938674 f01923786 f01836766 f0872282 is this OK?

cryptowhizzard commented 1 year ago

Hello @zcfil

All the SP's you mention belong to one organization as far as i can see. That is not good. Second, i am missing their contact information. Can you provide that?

I would like to see 2 independent organizations at minimum and some diversity at least. Everything is in HK/China from the SP's you gave me.

f0872282 -> {12D3KooWEBLH2HU5zXHSnbTJoWPLVnzMH47gexN9XcNypUdUEhYn: [/ip4/103.201.24.105/tcp/50001]} ERROR: failed to parse multiaddr "f0872282": must begin with /

You have new mail in /var/mail/root root@proposals:~/api/fullauto# lotus net connect f01836766 f01836766 -> {12D3KooWFqpihaLDW5VMYCpxrg6HX7Mm9PtxFUvEJcJXzydSkiRp: [/ip4/103.44.239.188/tcp/50001]} ERROR: failed to parse multiaddr "f01836766": must begin with /

root@proposals:~/api/fullauto# lotus net connect f01923786 f01923786 -> {12D3KooWJ9YWSrjpvSmMy7KL9NJwXgreiBBVi1KkVTBEMWmLeTCe: [/ip4/154.23.114.17/tcp/50001]} ERROR: failed to parse multiaddr "f01923786": must begin with /

root@proposals:~/api/fullauto# lotus net connect f01938674 f01938674 -> {12D3KooWSqR5FMNJg7zJUBf56qeY6azkUqDpvsxYWp6F81EcziEm: [/ip4/183.61.189.117/tcp/50001]} ERROR: failed to parse multiaddr "f01938674": must begin with /

zcfil commented 1 year ago

Sorry, we are contacting organizations in Singapore. I believe we can reach cooperation soon. I can give you their contact information, but most of us use WeChat for communication. Do you need to give you WeChat.

cryptowhizzard commented 1 year ago

I prefer their e-mail then + wechat for their contact. As long as i can reach them to verify i am ok for now.

Btw, you are required to stay in contact with them for the deal duration period ( As long as the storage of your data goes for the retrieval period ). I would recommend not to take "just some" . There are plenty applications here on GIT who went into trouble just because of that as their SP's went "bad".

zcfil commented 1 year ago

Hello @cryptowhizzard This is our latest SP and their contact information

f01938674&f01836766 In Chinese Mainland email:469230190@qq.com

f01923786&f0872282 In HK email:luozhenjun@szfil.com

f01877184 In Singapore email:8954056@gamil.com

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaced53jtrbzdsfiyml4k4f7yueyzqsauh5zwdazcuh7zaejlbuwsaii

Address

f1rabmbft72reiqvq4wwda34gi7wpbawevjjtrg7q

Datacap Allocated

100.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

755932c9-9980-4dd1-9cdf-9fd2099d0c9f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced53jtrbzdsfiyml4k4f7yueyzqsauh5zwdazcuh7zaejlbuwsaii

Sunnyiscoming commented 1 year ago

Hi @wsz-llh You need one more notary approve this application. You can ask more notaries to do client due diligence and approve the application in slack channel. https://app.slack.com/client/TEHTVS1L6/C036JKD8NVA/thread/C03BG1MNQ4T-1673888660.823499

DaYouGroup commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 99.81% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

zcfil commented 1 year ago

@DaYouGroup Hello,I will allocate these data reasonably next time.

DaYouGroup commented 1 year ago

The client contacts us. Based on past information and verification of stored data. We are willing to support this round.

c486dbcd6619fb252c7a8a7f75ec2a4

Normalnoise commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 99.81% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.