filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap应用程序] <新颜人工智能> - <模型数据> #2118

Closed fylsan3 closed 1 year ago

fylsan3 commented 1 year ago

Data Owner Name

上海新颜人工智能科技有限公司

What is your role related to the dataset

Dataset Owner

Data Owner Country/Region

China

Data Owner Industry

IT & Technology Services

Website

https://xinyan.com https://www.xinyan-ai.com

Social Media

QR code of WeChat official account at the bottom of official website

Total amount of DataCap being requested

10PiB

Expected size of single dataset (one copy)

1P

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

https://www.xinyan-ai.com/about.html

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Graphic image recognition model training data
ORC model training data

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (City and Country)

No response

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

No response

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

No response

Please share a sample of the data

https://www.robots.ox.ac.uk/~vgg/data/scenetext/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Boost client, Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago
fylsan3 commented 1 year ago
  1. We have enough tokens to consume
  2. We used to do personal credit reporting business, but later due to policy restrictions, this business disappeared. The source and processing of data that you know are important for credit reporting, and we need to establish a strict data model. I found a news article from a long time ago that you can take a look at“ https://www.yicaiglobal.com/news/two-firms-are-investigated-as-china-online-loan-regulations-extend-to-big-data ”. About us
  3. We will prepare the data ourselves

I have sent an email and thank you for your support.

ghost commented 1 year ago

Hello @fylsan3 Per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be triggered for notary review. Let us know if you have any questions.

Sunnyiscoming commented 1 year ago

Received email. I have questions about whether the data can be fully disclosed after reading the report. Best practice for storing large datasets includes ideally, storing it in 3 or more regions, with 4 or more storage provider operators or owners.You should list Miner ID, Business Entity, Location of sps you will cooperate with.

Sunnyiscoming commented 1 year ago

Any update here?

fylsan3 commented 1 year ago

@Sunnyiscoming We filled out the form [Fil+ registration form], we are ready,thanks

ghost commented 1 year ago

Confirming the following SP Entities were submitted: f02234424 algorithdata USA f02246008 Calculus Hong Kong f0222811 HanTangcloud Hangzhou f02274504 Dotdata Shenzhen f02220982 interstellar cloud Guangzhou

we are contacting to confirm locations

fylsan3 commented 1 year ago

Hello, @Filplus-govteam We verified that the address is real and not using a VPN, what else do I need to do next? @Sunnyiscoming Can you pass us?

Sunnyiscoming commented 1 year ago

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2118#issuecomment-1667670456 Contacting to confirm locations.

fylsan3 commented 1 year ago

@Filplus-govteam @Sunnyiscoming can you tell me contact email or slack to confirm, we have not received any message, but we have applied for a long time

ghost commented 1 year ago

@fylsan3 no word from f02234424

ghost commented 1 year ago

Hello, @Filplus-govteam We verified that the address is real and not using a VPN, what else do I need to do next? @Sunnyiscoming Can you pass us?

@fylsan3 how did you verify that is not using VPN?

fylsan3 commented 1 year ago

Hello, https://www.filutils.com/zh/miner/f02234424 is normal, this is f02234424ip

  1. The external active access ip is 38.32.189.82,
  2. The externally declared identity ip is 38.32.189.83:52233

We are working hard to find the sp. If you think there is something wrong with this f02234424, we don't cooperate with this SP.

We can cooperate with "f02058333". The location is also in the United States, Or, we can cooperate with f02199203, it is in China,please confirm!

If you think they are not suitable, please let me know as soon as possible, we will look for other SP cooperation. Thank you for your hard work.

In the end, I confirmed with the SP we contacted very seriously, none of them used VPN, Although this is a verbal promise from sp, but I choose to believe them, If we find that sp is dishonest, we will immediately cancel our cooperation with them!

f02234424 algorithdata USA f02246008 Calculus Hong Kong f0222811 HanTangcloud Hangzhou f02274504 Dotdata Shenzhen f02220982 interstellar cloud Guangzhou

f02058333 algorithdata USA f02199203 individual investor

Finally, please pass us as soon as possible. Regarding the cooperative SP, we can completely listen to your arrangements. If you think any sp is not suitable, we will stop cooperating with them immediately. If you have a recommended SP, please let me know , we will work with them right away. Thanks again!

ghost commented 1 year ago

@fylsan3 you can work with any SPs you choose. We'll monitor the list above. Thank you. FYI @Sunnyiscoming

fylsan3 commented 1 year ago

Okay, I hope to get your continued attention, so, can you approve our request? @Sunnyiscoming @Filplus-govteam

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

10PiB

Expected weekly DataCap usage rate

1PiB

Client address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

DataCap allocation requested

512TiB

Id

9ed7f648-c2fa-4abd-8263-026b784904d3

Casey-PG commented 1 year ago

Based on the information provided above, I'm willing to support the first round. Hope you keep the Fil+rules.

Casey-PG commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceanmemmfdrm3luhauijkwcu6yss5uedqztzmn2anfeo4kbpzc6mpy

Address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Datacap Allocated

512.00TiB

Signer Address

f1d4yb3wags3mtddzesxoo63jv7dmlec3bq4yteni

Id

9ed7f648-c2fa-4abd-8263-026b784904d3

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceanmemmfdrm3luhauijkwcu6yss5uedqztzmn2anfeo4kbpzc6mpy

ipollo00 commented 1 year ago

LGTM Will follow up on the situation of sps allocation.

ipollo00 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea4ismyefuxncdpy27ejqqobaejh73hjy25oshd3h7aqrfkiumftg

Address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Datacap Allocated

512.00TiB

Signer Address

f1n5wlrrhoxpkgwij25xrtt7w7g2k3fhbthmdn6ri

Id

9ed7f648-c2fa-4abd-8263-026b784904d3

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea4ismyefuxncdpy27ejqqobaejh73hjy25oshd3h7aqrfkiumftg

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 90% of total datacap - f02246008: 100.00%

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

@fylsan3

What about distribution?

You used all your datacap until now and send it to one SP?

cryptowhizzard commented 1 year ago

Secondly:

I tried to retrieve something from SP f02246008 to do due diligence but it seems they are not open for fast retrieval. To verify your data, can you please be so kind to upload to http://send.datasetcreators.com :

baga6ea4seaqklofxon7hemedalc4injfkzgc6votrqkw6n35bfrnk5utg4gq4da

I want to unpack it and check your data.

fylsan3 commented 1 year ago

WechatIMG583 We have prepared 200T of data and it will take time to transfer it to the SP (we use the network and offline hard disk mailing). Therefore, for the time being, we will only issue an order for f02246008, which has already completed the data transfer. We also noticed that there are problems with the retrieval of this node, and SP has reported that the retrieval index construction was not successful, which is currently under repair. Therefore, we have temporarily stopped issuing further orders. We will also initiate an order after other SPs receive the data in the future, and we will ensure that the distribution of the data complies with community rules. We will upload the corresponding data that needs to be checked as soon as possible. Thank you for your attention.

fylsan3 commented 1 year ago

Hello, we only sent 20% to this SP, we will not send to this SP again, maybe you should be patient and wait for the bot report.

herrehesse commented 1 year ago

Flagging for abuse & selfdealing @simonkim0515 @kevzak

cryptowhizzard commented 1 year ago

Dear fylsan3,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqklofxon7hemedalc4injfkzgc6votrqkw6n35bfrnk5utg4gq4da ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

fylsan3 commented 1 year ago

hi, baga6ea4seaqklofxon7hemedalc4injfkzgc6votrqkw6n35bfrnk5utg4gq4da corresponds to a car with 35G, http://send.datasetcreators.com can only support files up to 32.7G, so it may not be able to upload. So we let baga6ea4seaqklofxon7hemedalc4injfkzgc6votrqkw6n35bfrnk5utg4gq4da be preferentially encapsulated by other sps, and f02246008 can now be retrieved successfully. We tested the retrieval of f02246008 and it is now very fast: "lotus client retrieve --provider f02246008 bafykbzaceb4gylfri63q6s4kjhonivijt4rd7iem3e4mh5mmt5eloow7dfmbw baga6ea4seaqklofxon7hemedalc4injfkzgc6votrqkw6n35bfrnk5utg4gq4da .car" So you can retrieve and download from f02246008 immediately, or wait for about 24 hours to retrieve from f0222811, f02220982, f02199203 and other nodes. We hope to get your support and kind treatment, thank you! WechatIMG595

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

DataCap allocation requested

512TiB

Id

4d77ff2c-b1e2-4aff-aeda-609645a14e3e

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

512TiB

Datacap to be granted to reach the total amount requested by the client (10PiB)

9.5PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
5019 3 512TiB 37.46 129.18TiB
cryptowhizzard commented 1 year ago

At this moment i don't have other means to check your data. I hope to do DD in the near future then for you.

laurarenpanda commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

laurarenpanda commented 1 year ago

The CID report looks good to me. Hope more SPs from Asia-GCN can join and store data for this program. Willing to support this round.

laurarenpanda commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceaywtv7bcc4l7gayqsgtcegwjeiwb2gfl6odbcffynmyaff3cegms

Address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Datacap Allocated

512.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

4d77ff2c-b1e2-4aff-aeda-609645a14e3e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaywtv7bcc4l7gayqsgtcegwjeiwb2gfl6odbcffynmyaff3cegms

DaYouGroup commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

DaYouGroup commented 1 year ago

This project looks very healthy, and there is no problem with retrieval. I am willing to support it and will continue to pay attention to it in the future.

0ee2f71c81c2d664d12297c41c3e1ee
DaYouGroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecv7mnwldrrr64ocmrsgki67xtkrrf3osa42ld4qhpf7tthh2q5x4

Address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Datacap Allocated

512.00TiB

Signer Address

f1nwjsd2mc6hu4qrwnmd6ukrfkuu4h5fhs7u3exii

Id

4d77ff2c-b1e2-4aff-aeda-609645a14e3e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecv7mnwldrrr64ocmrsgki67xtkrrf3osa42ld4qhpf7tthh2q5x4

kevzak commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

kevzak commented 1 year ago

SP list provided: f02234424 algorithdata USA f02058333 algorithdata USA f02246008 Calculus Hong Kong f0222811 HanTangcloud Hangzhou f02274504 Dotdata Shenzhen f02220982 interstellar cloud Guangzhou f02199203 individual investor

SPs taking deals: f02220982 | Guangzhou, Guangdong, CNChina Mobile communications corporation | 42.75 TiB | 8.71% | 42.75 TiB | 0.00% f02199203 | Hohhot, Inner Mongolia, CNCHINA UNICOM China169 Backbone | 113.63 TiB | 23.16% | 113.63 TiB | 0.00% f02221110 | Shenzhen, Guangdong, CNCHINANET-BACKBONE | 51.00 TiB | 10.39% | 51.00 TiB | 0.00% f02200472 | Chengdu, Sichuan, CNCHINANET-BACKBONE | 704.00 GiB | 0.14% | 704.00 GiB | 0.00% f0222811 | Hangzhou, Zhejiang, CNCT-HangZhou-IDC | 9.06 TiB | 1.85% | 9.06 TiB | 0.00% f02274504 | Shenzhen, Guangdong, CNCTGNet | 116.75 TiB | 23.79% | 116.69 TiB | 0.05% f02246008 | Hong Kong, Central and Western, HKHKBN Enterprise Solutions HK Limited | 153.00 TiB | 31.18% | 151.75 TiB | 0.82% f01920887 | Melbourne, Victoria, AUVocus Connect International Backbone | 3.78 TiB | 0.77% | 3.78 TiB | 0.00%

@fylsan3 I'm seeing three SP IDs matching the list you provided. Please explain the discrepancy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

DataCap allocation requested

512TiB

Id

3c2c4b5f-cb29-4e9a-93c3-881c34e04b43

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f16bxqdvcliy3x2o4q7w2q5dm4zj4tls6pfsjpmma

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

512TiB

Datacap to be granted to reach the total amount requested by the client (10PiB)

9.5PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
6112 6 512TiB 30.76 143.68TiB
fylsan3 commented 1 year ago

Dear, thank you for your attention and support, as you can see, we initially filled out 5 SPs, and we matched 3 of them, the 2 SPs that were not matched were US SPs because the distance was relatively long, when our hard drive Mailed to the United States, SP told me that due to market reasons, it will not be packaged temporarily, so we are actively contacting other nodes. Sorry for causing trouble to you. Since we filled in 10 copies when filling out the application form, we are working hard to increase the number of cooperative SPs to 10. According to your request, we have listed the SPs we are working with again. If you have any questions, Welcome you contact me anytime, thank you for your attention and support, I wish you a happy life! f02246008:Hong Kong f02199203:Inner Mongolia f01920887:Australia f02122388:Guangdong f02221110:Guangzhou f02221111:Guangzhou f0222811:Hangzhou f02519843:Tianjin f02370792:Guangdong

Tom-OriginStorage commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.