filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Sinso #961

Closed Sinsoteam closed 1 year ago

Sinsoteam commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

**Organization**
The Sinso team was established in October 2020. As the leading medical imaging SaaS cloud service provider, the core members have served more than 500 medical institutions and more than 80,000 medical imaging doctors.
Sinso builds the Sinso DAC ecology based on WEB3 technology, and jointly promotes the human society to enter the era of decentralized medical care.
**Our project**
Our program also participated in the Filecoin Frontier Accelerator.
Sinso, as a medical image data aggregator for telemedicine and AI diagnosis, provides a basis for remote diagnosis services to solve the problem of authenticity of patient data. Through a customized NFT release template for medical data, users actively participate in data confirmation and help users strengthen data collection And the conversion process of data assets to strengthen the flow of medical data. Therefore, Sinso also provides the issuance of NFT-like virtual assets to further promote the full range of doctors towards free practice.
Users collect data through Sinso Getway, cast health/medical-related data into NFTs in Sinso DAPP, and trade on Sinso Doctors Network to realize the value transfer of medical data. Sinso is based on WEB3 technology and Sinso DAC ecology to jointly create and promote human society into the era of decentralized medical care.
We also achieved a good ranking in the acceleration camp.
The following links are about our projects:
https://www.bilibili.com/video/BV1PV411J7ma/

What is the primary source of funding for this project?

A: At present, the project investment mainly comes from filecoin's ecological investment, for example: whylab investment fund.

What other projects/ecosystem stakeholders is this project associated with?

A: IPFS/filecoin、Polkadot 、Ethereum

Use-case details

Describe the data being stored onto Filecoin

A: Mainly medical record data, XML and medical image DICOM3.0 files

Where was the data in this dataset sourced from?

A: Uploaded from patients, doctors and hospitals

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

A: Please refer attachment for detail.
https://drive.google.com/file/d/1JTFp0BYAMygmRHHckv3cBFaS9UonRMIF/view?usp=sharing

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, confirm. (Our data SDM-static data masking & DDM-dynamic data masking)

What is the expected retrieval frequency for this data?

A: At present, the retrieval frequency in one month is relatively high. Over three months, the retrieval probability will be reduced by 80%

For how long do you plan to keep this dataset stored on Filecoin?

A: Permanent storage as a patient's long-term personal health asset

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Mainly China or other countries in Asia.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will transmit the data to the miners both online and offline.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

The main factors we considered are the following:
1.Location (Near China)
2.Possess of experience in dealing with verified data.
3.Possess of more than 10 PiB Total Raw Power

How will you be distributing deals across storage providers?

We will follow the rules for large-datasets, and we will ensure fair distribution through limiting the amount of deals send to

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes
Bennyyangpu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecu4nmirigyc7b4rtre36pzb6hugze6easogdmliqed2kenjv7xmw

Address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Datacap Allocated

400.00TiB

Signer Address

f174fg3bqbln3zjnkxtyf6s54txqkr7yqkj6cig7y

Id

3dee588e-8f86-4a8b-9905-0dde2c5eeccb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecu4nmirigyc7b4rtre36pzb6hugze6easogdmliqed2kenjv7xmw

AthSmith commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceasc4xmkqargodt6grjp2f44h2zcg5mspatwjb4arckayiubikzjo

Address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Datacap Allocated

400.00TiB

Signer Address

f1vxbqrf7rfum3n6m5u6eb4re6xj7amvsaqnzu64y

Id

3dee588e-8f86-4a8b-9905-0dde2c5eeccb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceasc4xmkqargodt6grjp2f44h2zcg5mspatwjb4arckayiubikzjo

AthSmith commented 1 year ago

Necessary to pay attention to the success rate of the retrieval, although the data reported so far is not very accurate. Expect to see changes soon.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 7

Multisig Notary address

f02049625

Client address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

DataCap allocation requested

400TiB

Id

335c49a9-d730-4ad0-b9b3-a5a8cb0e9759

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

3.63797880709172e+64YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

3.63797880709172e+64YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
38819 8 400TiB 31.15 88.06TiB
BobbyChoii commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacearjyn4fmks6zr2uryagquo6smxj3ue2mx3c3qtl6pdtg7smurpmo

Address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Datacap Allocated

400.00TiB

Signer Address

f1irqs2gmctiv3jcdfwuch7oxvf4ixh3k4b2wc24i

Id

335c49a9-d730-4ad0-b9b3-a5a8cb0e9759

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacearjyn4fmks6zr2uryagquo6smxj3ue2mx3c3qtl6pdtg7smurpmo

Casey-PG commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacectornbg436p4nmc42h7m6imh56u433xhey2thkh3cbodnql2fnsc

Address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Datacap Allocated

400.00TiB

Signer Address

f1d4yb3wags3mtddzesxoo63jv7dmlec3bq4yteni

Id

335c49a9-d730-4ad0-b9b3-a5a8cb0e9759

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacectornbg436p4nmc42h7m6imh56u433xhey2thkh3cbodnql2fnsc

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 8

Multisig Notary address

f02049625

Client address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

DataCap allocation requested

400TiB

Id

818faae9-8fc2-4c9f-a626-a7096df46c9f

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

3.6379788070917165e+79YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

3.6379788070917165e+79YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
46780 9 400TiB 26.94 31.96TiB
github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

MarshLin88 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 55.24% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

MarshLin88 commented 1 year ago

The report shows the retrieval rate is very low.

Wengeding commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 55.24% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

Wengeding commented 1 year ago

Report is OK, please try to improve the Deal Data Replication in the next round.

Wengeding commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacea6laqtgrsdazk6jsy52lm4b2xrv6xi3xmc2jkzjbbklblh4u6dya

Address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Datacap Allocated

400.00TiB

Signer Address

f1txfsjmix4vlzido4dkildrnbw26owtlbslexmpa

Id

818faae9-8fc2-4c9f-a626-a7096df46c9f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea6laqtgrsdazk6jsy52lm4b2xrv6xi3xmc2jkzjbbklblh4u6dya

large-datacap-requests[bot] commented 1 year ago

We have found some problems in the information provided in the Approved Comment. We could not find Address** field in the information provided

Please, take a look at the comment and edit the body of the comment providing all the required information.
cryptowhizzard commented 1 year ago

@Wengeding

Exactly after i posted in Slack a message that HTTP retrievals are not reliable and that the retrieval bot is tricked you are approving this client.

This client is not supporting HTTP retrieval and your signature should be revoked until proper retrieval is fixed.

cryptowhizzard commented 1 year ago

@Sinsoteam

Please enable retrieval on your data according to the rules of FIL+.

cryptowhizzard commented 1 year ago

As can be seen SP's f0126478, f02025503, f02114868, f02114994 and f02128256 all were heavily involved in CID sharing The others have Retrieval off.

This LDN should have never received a signature.

Scherm­afbeelding 2023-07-27 om 21 21 42

Legenda: Client = The LDN data preparer SP = SP TotalDeals = Total volume of data in bytes Ipadress = Ipadress Port = Port Sp is listening on Name = Name as known by DCENT Online = The miner is online Subnet = The class C subnet derived from IP adres Matches in Subnet = The number of machines present in this subnet. Location = Location on the globe VpnResult = VPN score for IP adres given according to ipqualityscore.com % = The percentage of deals this SP received in this LDN. AT = Abuse Type. 1 means CID sharing abuse / 3 means Not retrievable. Asouce = Abuse Source client address. MasterAddress = The address this SP is funded from.

Wengeding commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 57.37% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

Signed without retrieval. Dispute made.

Scherm­afbeelding 2023-07-31 om 18 14 04
Wengeding commented 1 year ago

Since the report shows that the "Retrieval Statistics" are normal(>1%), I checked the "Retrieval Dashboard" and tried to download the data, which was successful. I don't think your comment is correct. @cryptowhizzard

Wengeding commented 1 year ago

There is no CID sharing found in this application, if the collaborating SPs have some CID sharing with other applications, you should flag them, not this one. image

cryptowhizzard commented 1 year ago

I have the logs here with download attempts. They all fail, no matter where i download on this globe. The fact that someone enables download for the first 100 MB of a deal to trick the deal bot does not mean that his dataset is readily retrievable on the network.

Also notified governance about your last signature.

*Edited the typo You / His.

Wengeding commented 1 year ago

It's not my dataset. You might be mistaken.

cryptowhizzard commented 1 year ago

See the screenshot.

Everything with AT 1 = Abuse / CID sharing. The LDN is in the AuditTrail column. Everything with AT 3 does not have retrieval enabled / tricked for only a few MB.

Scherm­afbeelding 2023-08-03 om 18 24 17
Wengeding commented 1 year ago

This tool looks good, did you develop it yourself? Might need to get validation from the community before using this result as a benchmark for judging though.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

Sinsoteam commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 61.76% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

Sinsoteam commented 1 year ago

See the screenshot.

Everything with AT 1 = Abuse / CID sharing. The LDN is in the AuditTrail column. Everything with AT 3 does not have retrieval enabled / tricked for only a few MB.

Scherm­afbeelding 2023-08-03 om 18 24 17

You need to check your answer.

cryptowhizzard commented 1 year ago

This client is actively stalling http retrievals and blocked http ranged requests with a reverse proxy to prevent it's data being investigated.

It works as follows:

One set's a bandwidth limit with NGINX on the HTTP retrieval. After a random certain amount the limit is set to zero. This makes the transfer timeout. Because range retrieval is disabled in NGINX one cannot pick up where he left and needs to start all over again.

Log can be found at http://datasetcreators.com/downloadedcarfiles/logs/961.log

ghost commented 1 year ago

Hello @Sinsoteam per the new guidelines https://github.com/filecoin-project/notary-governance/issues/922 for Open Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity toward the Fil+ guideline of a distributed storage plan and SPs posted in the comments here. Let us know if you have any questions.

Sinsoteam commented 1 year ago

This client is actively stalling http retrievals and blocked http ranged requests with a reverse proxy to prevent it's data being investigated.

It works as follows:

One set's a bandwidth limit with NGINX on the HTTP retrieval. After a random certain amount the limit is set to zero. This makes the transfer timeout. Because range retrieval is disabled in NGINX one cannot pick up where he left and needs to start all over again.

Log can be found at http://datasetcreators.com/downloadedcarfiles/logs/961.log

No. We have communicated with SPs and have made sure they are not using NGINX. I am not sure if some of your behavior triggered the network's security defenses.

cryptowhizzard commented 1 year ago

This client is actively stalling http retrievals and blocked http ranged requests with a reverse proxy to prevent it's data being investigated. It works as follows: One set's a bandwidth limit with NGINX on the HTTP retrieval. After a random certain amount the limit is set to zero. This makes the transfer timeout. Because range retrieval is disabled in NGINX one cannot pick up where he left and needs to start all over again. Log can be found at http://datasetcreators.com/downloadedcarfiles/logs/961.log

No. We have communicated with SPs and have made sure they are not using NGINX. I am not sure if some of your behavior triggered the network's security defenses.

Again:

Boost supports range retrievals. Simple explanation , if a download breaks due timeout , it should pick up again where it left. This function is disabled on your side making retrieval impossible.

Fix it.

TakiChain commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 61.76% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

TakiChain commented 1 year ago

Retrieval report seems normal. Please keep your request in accordance with the principles of the program and in line with their allocation strategy. @Sinsoteam

TakiChain commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecclvi2iekxqbcnnhiy6utc4ulrjrvqjeva22c2ahrt2s4n6a6wsk

Address

f1lozjhyay3heeav3wm4ttycoaumjgtgrp452woki

Datacap Allocated

400.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

818faae9-8fc2-4c9f-a626-a7096df46c9f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecclvi2iekxqbcnnhiy6utc4ulrjrvqjeva22c2ahrt2s4n6a6wsk

raghavrmadya commented 1 year ago

@TakiChain , I see that another notary is actively conducting due diligence on this application. Can you share evidence of what kind of due diligence you performed? Looking at the bot report is not enough. Did you attempt retrieval sampling?

This application has been under dispute - https://www.notion.so/filecoin/LDN-signed-without-retrieval-aebb6f0a736549ae85e03d7b2d411f0a?pvs=4

Client must show evidence of retrievability to continue

TakiChain commented 1 year ago

@raghavrmadya Isn't the retrieval report valid evidence? Why not consider upgrading your report?

cryptowhizzard commented 1 year ago

Dear Sinsoteam,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqjhbh6emggbx2zlxaaiqjcrci5ujmjnq6n6lh6kk3vwmn5de56gjq ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

Sinsoteam commented 1 year ago

@raghavrmadya This can show that we support retrieval. image

raghavrmadya commented 1 year ago

Thanks @Sinsoteam

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

Casey-PG commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 61.76% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.