filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] <RongYin> - <AI Tools-NLP> #2050

Closed datalove2 closed 1 year ago

datalove2 commented 1 year ago

Data Owner Name

RongYin

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

China

Data Owner Industry

IT & Technology Services

Website

https://www.qcc.com/firm/3380acbb3101bd58394d1ba4be51e877.html

Social Media

https://www.qcc.com/firm/3380acbb3101bd58394d1ba4be51e877.html

Total amount of DataCap being requested

15PiB

Expected size of single dataset (one copy)

1.5PiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

RongYin was established in 2019 in HK. We were provided with a storage capacity in total of 150PiB. Now, we are planning to engage in onboard humanity data which is useful for the network. <RongYin Open Data Project> has successed onboard 10PiB storage capacity to the network, which is about 1.5P raw data. For the next steps, we have prepared 3P raw data with 10x backups.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

We are going to onboard open data of <natural language processing> from AWS, which is a branch of artificial intelligence (AI) that enables computers to comprehend, generate, and manipulate human language. Natural language processing has the ability to interrogate the data with natural language text or voice.
Natural language processing datasets covers 68 matching datasets. In total about 1.7PiB. 
Including Common Crawl, Sudachi Language Resources, Japanese Tokenizer Dictionaries, MIMIC-III (‘Medical Information Mart for Intensive Care’), Common Screens, Discrete Reasoning Over the content of Paragraphs (DROP), End of Term Web Archive Dataset, MultiCoNER Datasets, etc.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/eot-web-archive/
https://registry.opendata.aws/allenai-quoref/
https://registry.opendata.aws/comonscreens/
https://registry.opendata.aws/allenai-drop/
https://registry.opendata.aws/paracrawl/
https://registry.opendata.aws/allenai-quoref/
https://registry.opendata.aws/mmid/
...

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

f02144602,f02148382,f02037700,f02192496,f01834253,f02212669,etc.

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

Could you send an email to filplus-app-review@fil.org with your official domain in order to confirm your identity? Email name should includes the issue id #2050.

datalove2 commented 1 year ago

@Sunnyiscoming Hi, The email has been sent on last week. Please kindly check it.

Sunnyiscoming commented 1 year ago

Hello. I have received your email. But I do not see your business license.

datalove2 commented 1 year ago

Hi @Sunnyiscoming, the business license has been attached. I resent the email. Please take a look. Thank you.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

15PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

DataCap allocation requested

512TiB

Id

fdf30adb-8ae7-427a-bfcf-64383048e88d

1ane-1 commented 1 year ago

I will support you for the first round. Keep following

1ane-1 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecz3k7aaozrkxcsweqbbpoyurc36r5cbuftneo7lm4zl7us5qargi

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

512.00TiB

Signer Address

f1mdk7s2vntzm6hu35yuo6vjubtrpfnb2awhgvrri

Id

fdf30adb-8ae7-427a-bfcf-64383048e88d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecz3k7aaozrkxcsweqbbpoyurc36r5cbuftneo7lm4zl7us5qargi

AlanGreaterheat commented 1 year ago

Willing to support in the first round, will pay attention to the data dispersion later.

AlanGreaterheat commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceaclga63k6lkhgtalz7lizicwwkrfosm6slu26gb76pp3osszkqz2

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

512.00TiB

Signer Address

f1pnmzlxj7cfeo2v6oj5nco46hkg2l46wj7o4xxui

Id

fdf30adb-8ae7-427a-bfcf-64383048e88d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaclga63k6lkhgtalz7lizicwwkrfosm6slu26gb76pp3osszkqz2

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 98.10% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

DataCap allocation requested

512TiB

Id

57e04d8f-6be5-4429-8aae-838981e217d5

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

512TiB

Datacap to be granted to reach the total amount requested by the client (15PiB)

14.5PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
8665 6 512TiB 32.54 137.87TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

datalove2 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 92.56% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

datalove2 commented 1 year ago

Hello community members and notaries, our program has been updated, and we can now see that the bot is displaying the retrieval rate correctly. Additionally, as the bot conducts retrieval tests approximately once per hour, we will observe an increasing retrieval rate over time. We appreciate the support of the notaries to witness the next round of retrieval rate updates.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 70.51% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

Fatman13 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceamgtznfeeaeo4c5uvy5tavrx223gz24dcdxl6vzqb5rxsjmnglz6

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

512.00TiB

Signer Address

f1j3u7crhjzwb2cj5mq7vodlt4o66yoyci7lhcauy

Id

57e04d8f-6be5-4429-8aae-838981e217d5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceamgtznfeeaeo4c5uvy5tavrx223gz24dcdxl6vzqb5rxsjmnglz6

Fatman13 commented 1 year ago

Reached out by the client on Slack. Walked through their plans on how they could improve on CIDChecker warnings.

woshidama323 commented 1 year ago

Will also support this application and suggest you should fix those warnings in the next round

woshidama323 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebjl7yppvperuqf4lj2hvfrdtqlgjvvqgjnsugpnvrsxunjnvubyw

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

512.00TiB

Signer Address

f12tk3adljauwnd3hjbigpfxb7b7gdlj63p6afwtq

Id

57e04d8f-6be5-4429-8aae-838981e217d5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebjl7yppvperuqf4lj2hvfrdtqlgjvvqgjnsugpnvrsxunjnvubyw

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

DataCap allocation requested

1PiB

Id

01e4f9f1-f645-47ad-aad8-f2c582d513c1

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Rule to calculate the allocation request amount

200% weekly > 1PiB, requesting 1PiB

DataCap allocation requested

1PiB

Total DataCap granted for client so far

465661.3YiB

Datacap to be granted to reach the total amount requested by the client (15PiB)

465661.3YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
11629 8 512TiB 24.28 107.43TiB
newwebgroup commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 98.02% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

newwebgroup commented 1 year ago

SP position distribution is very scattered, But file replicas are not healthy. Need to fix this issue ASAP no CID sharing

The retrieval success rate of nodes is not bad.

Willing to support this round

image
newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceair2m22y5j2qjltk4i7x7zcyb3vg73cj3idxhqh5tfjrhmktk2ak

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

1.00PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

01e4f9f1-f645-47ad-aad8-f2c582d513c1

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceair2m22y5j2qjltk4i7x7zcyb3vg73cj3idxhqh5tfjrhmktk2ak

luobin544 commented 1 year ago

Check project status is good, supported

luobin544 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebbucorkoj4dp65fni2hkxumy5v2ud65xkz4stvglq3qjhgy5olte

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

1.00PiB

Signer Address

f1tbd632f6w62glfaf7wjpimacbnjiz26poyoes2q

Id

01e4f9f1-f645-47ad-aad8-f2c582d513c1

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebbucorkoj4dp65fni2hkxumy5v2ud65xkz4stvglq3qjhgy5olte

herrehesse commented 1 year ago

@luobin544 and @newwebgroup have been recommended for the removal of their notary status. Notary @luobin544 showed a lack of due diligence by signing the application without any due diligence, simply relying on the statement "Check project status is good, supported." Notary @newwebgroup attempted to perform due diligence but still promptly signed the application, which is not proper due diligence at all. It's important to note that signing off on a 1PiB trench is a significant decision.

Furthermore, there are concerns regarding the distribution process, as it appears that all distribution is occurring in a single region, or it could potentially be fake distribution using a VPN.

Additionally, this application involves a merged dataset request, which the community has decided is no longer allowed. Despite being aware of this decision, the notaries still proceeded to sign the application. For reference, please see the following link: https://github.com/filecoin-project/notary-governance/issues/832

I intend to initiate a dispute regarding these matters.

@raghavrmadya and @dkkapur

datalove2 commented 1 year ago
  1. Those sp I am cooperating with are NOT using VPN, and the whole process follows the rules. In the case of fully complying with the rules, is it necessary for the notary to ask some unnecessary questions? Just to bother clients?
  2. There are no restrictions on clients applying dc using aws data. In the proposal you mentioned, there are no consensus has been reached. An unresolved proposal CANNOT be the reason to remove everyone's effort.
  3. Your allegations against these notaries make me uncomfortable even though I am not a notary. They reviewed it for us and made suggestions for our application form. What they're doing is making things go more smoothly, How about you? If you feel that the data I applied for does not meet the "requirements", then I should be rejected at the "Datacap Request" stage. Instead of accusing the notary who works for us here. I actually asked a lot of people through Slack, but very few people were willing to help. What do you think about it?
datalove2 commented 1 year ago

https://github.com/filecoin-project/notary-governance/issues/921#issuecomment-1627975484

This community member did an IP check for us and the results were the same as we explained, we were not using a VPN

kernelogic commented 1 year ago

I checked T&T dispute tracker and the dispute is marked as resolved. OK to proceed. Neither VPN usage or merged dataset is a reason to cancel the LDN according to guidelines.

image
kernelogic commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 99.17% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

large-datacap-requests[bot] commented 1 year ago

Looks like the bot was not able to retrieve the transaction on the lotus node. Please contact governance team. The message cid: bafy2bzacebbucorkoj4dp65fni2hkxumy5v2ud65xkz4stvglq3qjhgy5olte

Please, contact the governance team.
large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

DataCap allocation requested

2PiB

Id

085a5191-24f9-4c35-8d56-0c493f28b546

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

931322574615478927360.0YiB

Datacap to be granted to reach the total amount requested by the client (15PiB)

931322574615478927360.0YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
59693 15 1PiB 18.07 0B
kernelogic commented 1 year ago

I am seeing a slight increase on the replications and the client has reached to me privately saying more replications incoming. Going to support this round.

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedlkgmmonoyu4h4wtzuyotaasdpfnbvdm3u3hje7qqk3peztinbvw

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

2.00PiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

085a5191-24f9-4c35-8d56-0c493f28b546

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedlkgmmonoyu4h4wtzuyotaasdpfnbvdm3u3hje7qqk3peztinbvw

mikezli commented 1 year ago

Seeing that several of the allegations for this LDN have lapsed, and that the check bot shows compliant retrieval rates and CID sharing reports, we are willing to support this round of signatures.

Also, I will keep an eye on this LDN's data storage.

mikezli commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecqe4z72aoakuriue6xuln7iko22udo2gjzeatkr74nrvqq6szkfq

Address

f1tp3kxwlvxd3ggjsfcbivr25fzz2edrrrqe5vapy

Datacap Allocated

2.00PiB

Signer Address

f1dnb3uz7sylxk6emti3ififcvu3nlufnnsjui6ea

Id

085a5191-24f9-4c35-8d56-0c493f28b546

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecqe4z72aoakuriue6xuln7iko22udo2gjzeatkr74nrvqq6szkfq

herrehesse commented 1 year ago

Immediate removal of @mikezli after signing a disputed application.

https://github.com/filecoin-project/notary-governance/issues/921

kernelogic commented 1 year ago

I checked T&T dispute tracker and the dispute is marked as resolved. OK to proceed. Neither VPN usage or merged dataset is a reason to cancel the LDN according to guidelines.

image

I guess I must made a mistake mixed up 2050 and 2055 being disputed. The 2050 dispute is not on the tracker yet.

So what makes one issue being disputed? On the tracker or as soon as an issue is opened but not on the tracker? We need a single source of truth.

datalove2 commented 1 year ago

Thanks to the support of the two notaries @mikezli 、@kernelogic , the allegations against us for this LDN were the merging of datasets and the use of vpn's. At this point RG has publicly stated that the public datasets that have been applied for and completed are allowed to be signed, and also the allegations against the vpn's have been explained, and the allegations at filecoin-project/notary-governance#921 (comments) have been no follow-up questions. Also there were no new allegations from the TT meeting. @herrehesse

herrehesse commented 1 year ago

@datalove2 this is factually untrue. Notaries are not allowed to sign.

datalove2 commented 1 year ago

@herrehesse We have provided evidence for all your unjustified accusations against us, and now you are asking other notaries not to allow signatures without new evidence, is this what a qualified notary would do?