filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] <Ji Xia Technology Hong Kong Limited> - <CoinSummer Razzil> #1127

Closed maxvint closed 1 year ago

maxvint commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

CoinSummer Razzil is a blockchain data analysis and display platform focusing on EVM compatible chains.

We have collected the full on-chain data of Ethereum, Binance Smart Chain, Avalanche, Polygon, Arbitrum, Fantom and other Layer1 or Layer2 blockchains through full nodes RPC API, and then query from the full data , index price, marketcap, trading volume, TVL(total value locked) and other indicators, as well as the changes of these indicators, provide strong data support for investment decisions.

At present, the total on-chain data collected on the razzil platform over 12PB, of which 8PB of these data are archived compressed packages of full data, which are currently stored on AWS s3. It supports data analysis and display of more than 50 public chains and more than 800 protocols in various dimensions.

What is the primary source of funding for this project?

Own funds and revenue of the company.

What other projects/ecosystem stakeholders is this project associated with?

None

Use-case details

Describe the data being stored onto Filecoin

The data stored on Filecoin is the full blockchain archive data as .csv and .tar format.

Where was the data in this dataset sourced from?

From the full node we maintend and the indexed data by our data platform.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://drive.google.com/drive/folders/1wZFOGzCkBSzYE4sU8A56x01zMYu6jWkk?usp=sharing

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes

What is the expected retrieval frequency for this data?

3 - 5 times every 540 days.

For how long do you plan to keep this dataset stored on Filecoin?

Store for at least 540 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Asia and North America.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

We will send CAR files to storage providers offline for deal making.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will check their overall capacity and working hours, we need a storage provider that we can work with for a long time.

How will you be distributing deals across storage providers?

We will allocate DataCap according to miner's preference, such as online transfer and offline transfer.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes,we have enough funding and resources to start making deals.
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

200TiB

Client address

f1k7o76avapwoap5tce5ywbktwlgnroqktfg2sjii

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1k7o76avapwoap5tce5ywbktwlgnroqktfg2sjii

DataCap allocation requested

100TiB

Id

59b426ef-d2a9-4c68-bccd-3a8a3c31c8d5

psh0691 commented 1 year ago

I have a few questions about DC allocation.

  1. Can you prove the amount of data you currently have? Example) Screenshot of data capacity
  2. Are there any related SPs?
  3. Did you select the Filecoin node you want to save?
maxvint commented 1 year ago

@psh0691

  1. We have about 9PiB archived of the full blockchain data stored on AWS S3 which contains most of the EVM compatible chains, this is the screenshot of AWS S3 buckets.

    WechatIMG423
  2. We have contacted 4 SPs to store these data, we hope to find more.

  3. We plan to store Filecoin chain data in the future, which is still under development.

psh0691 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedujheoa2evgpaohrnrgzqbsdi52jjrmwz5uot2rhw5kfrqdgliyq

Address

f1k7o76avapwoap5tce5ywbktwlgnroqktfg2sjii

Datacap Allocated

100.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

59b426ef-d2a9-4c68-bccd-3a8a3c31c8d5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedujheoa2evgpaohrnrgzqbsdi52jjrmwz5uot2rhw5kfrqdgliyq

Fenbushi-Filecoin commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceaxj6moz2qab7oq25jgdpsvu6ystzya3z5kdcxk4sha2f3jda3zsk

Address

f1k7o76avapwoap5tce5ywbktwlgnroqktfg2sjii

Datacap Allocated

100.00TiB

Signer Address

f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq

Id

59b426ef-d2a9-4c68-bccd-3a8a3c31c8d5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaxj6moz2qab7oq25jgdpsvu6ystzya3z5kdcxk4sha2f3jda3zsk

maxvint commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

Hello @yuwenhui

I see deals on chain, however all deals come back as slashed. Can you tell me what has been going on here?

client-f01942130.csv

lotus state get-deal 22262474 ERROR: deal 22262474 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

For example.

maxvint commented 1 year ago

Hello @yuwenhui

I see deals on chain, however all deals come back as slashed. Can you tell me what has been going on here?

client-f01942130.csv

lotus state get-deal 22262474 ERROR: deal 22262474 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

For example.

Hello @cryptowhizzard Firstly, lotus state get-deal 22262474 ERROR: deal 22262474 not found, that's because the deal 22262474 did't completed sealing. You can try lotus state get-deal 22262474 again and it will display the deal state correctly.

{
  "Proposal": {
    "PieceCID": {
      "/": "baga6ea4seaqco33ozurglwm3if4flyfckj4pebpbagz3djkaim7r2ejfroq5kly"
    },
    "PieceSize": 34359738368,
    "VerifiedDeal": true,
    "Client": "f01942130",
    "Provider": "f02012032",
    "Label": "uAXASIKu5te5vPh-PQEHEjLlXyaRRsg-Rq-LYzeWPFOXKjP4_",
    "StartEpoch": 2543804,
    "EndEpoch": 3582997,
    "StoragePricePerEpoch": "0",
    "ProviderCollateral": "9675929767997153",
    "ClientCollateral": "0"
  },
  "State": {
    "SectorStartEpoch": 2517743,
    "LastUpdatedEpoch": -1,
    "SlashEpoch": -1,
    "VerifiedClaim": 4744821
  }
}

However, there was another problem after we confirmed with the SP f02012032, the deal state is StorageDealError, and has an the follow error:

error awaiting deal pre-commit: failed to set up called handler: on head changed error: called check error (h: 2516862): failed to check deal activity: failed to look up deal on chain: looking for publish deal message bafy2bzacebi44wtyxethxffuojvwfhq4fhl5cn65p6nsw5tffmoxljvxgycxw: not found

That's may be an unresolved bug of lotus market, our SP will use boost to import storage deals later.

cryptowhizzard commented 1 year ago

Hello @yuwenhui

Boost has been around for a long time now and working perfectly. "later" is not the right word. If you want to continue with this LDN the data needs to be retrievable and you need to find SP's that are on boost currently or who are not but have good retrieval in place ( i know there are multiple )

maxvint commented 1 year ago

@cryptowhizzard Okey, "later" means we will stop sent deals to these SPs until they can be retrievable. If they could not be retrievable, I will ask you for help in finding more SPs who have good retrieval in place.

Sunnyiscoming commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 2nd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01991416 Hong Kong, Central and Western, HK
China Unicom Global
24.00 TiB 31.67% 24.00 TiB 0.00%
f02006691 Hong Kong, Central and Western, HK
China Unicom Global
24.00 TiB 31.67% 24.00 TiB 0.00%
f02012032new Hong Kong, Central and Western, HK
China Unicom Global
23.78 TiB 31.38% 23.78 TiB 0.00%
f01824405 Hangzhou, Zhejiang, CN
Sichuan Chuanxn IDC
4.00 TiB 5.28% 4.00 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 2nd allocation, the following restrictions have been relaxed:

⚠️ 100.00% of deals are for data replicated across less than 2 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
75.78 TiB 75.78 TiB 1 100.00%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.[^3]

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1gwuigefejd5jkbtg4uznx45toaie5olh2cgm3dq FeigeData 224.00 GiB 7 1Fenbushi-Filecoin
1kernelogic
1liyunzhi-666
1psh0691

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

herrehesse commented 1 year ago

@yuwenhui Can you give me more information about the current SP's you are storing with? Their business name, region and city?

Sunnyiscoming commented 1 year ago

Hi, please explain the abnormal information. CID sharing has been observed.

maxvint commented 1 year ago

Hi, please explain the abnormal information. CID sharing has been observed.

Hello @Sunnyiscoming Thanks for your check on our LDN application.

We have contacted our SP and tried to work on the issue of CID sharing you mentioned. Currently we got the feedback that they made some technical mistake when they import the deals of these two datasets. We have communicated many times with SP to check their process of deal storage. And they promise that this will not happen again.

We also hope the community continue giving supervision and suggestions to help us store data on the Filecoin network more efficiently and smoothly.

cryptowhizzard commented 1 year ago

Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032-22203246: Error: retrieval query for miner f02012032 failed: failed to dial 12D3KooWLBMSggvqs69icS3srwKpZTtaZAS484aQ2XsDchcAUwV2: Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032-22203246: [/ip4/162.219.38.94/tcp/18762] failed to negotiate stream multiplexer: EOF Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032-22289736: Error: retrieval query for miner f02012032 failed: failed to dial 12D3KooWLBMSggvqs69icS3srwKpZTtaZAS484aQ2XsDchcAUwV2: Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032-22289736: [/ip4/162.219.38.94/tcp/18762] failed to negotiate stream multiplexer: read tcp4 212.6.53.183:34775->162.219.38.94:18762: read: connection reset by peer Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032-22201556: Error: retrieval query for miner f02012032 failed: failed to dial 12D3KooWLBMSggvqs69icS3srwKpZTtaZAS484aQ2XsDchcAUwV2: Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032-22201556: [/ip4/162.219.38.94/tcp/18762] failed to negotiate stream multiplexer: read tcp4 212.6.53.183:44425->162.219.38.94:18762: read: connection reset by peer Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032: Error: retrieval query for miner f02012032 failed: failed to dial 12D3KooWLBMSggvqs69icS3srwKpZTtaZAS484aQ2XsDchcAUwV2: Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032: [/ip4/162.219.38.94/tcp/18762] failed to negotiate stream multiplexer: read tcp4 212.6.53.183:33321->162.219.38.94:18762: read: connection reset by peer Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032: Error: retrieval query for miner f02012032 failed: failed to dial 12D3KooWLBMSggvqs69icS3srwKpZTtaZAS484aQ2XsDchcAUwV2: Feb 15 15:25:30 proposals dealscanner-f01942130-f02012032: * [/ip4/162.219.38.94/tcp/18762] failed to negotiate stream multiplexer: read tcp4 212.6.53.183:41663->162.219.38.94:18762: read: connection reset by peer

The retrieval function is still not working.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1127#issuecomment-1429294202

maxvint commented 1 year ago

@yuwenhui Can you give me more information about the current SP's you are storing with? Their business name, region and city?

Hello, @herrehesse Thanks for your check on my LDN application. Current SP's information as following:

f01824405 ChengDu, CN - RY f01991416 HongKong, CN - RIKIMARU f02012032 HongKong, CN - RIKIMARU f02006691 HongKong, CN - NIHICHE f02014107 Seoul, KR - HMT

cryptowhizzard commented 1 year ago

Good morning. Retrieval is still not working. Can you have this fixed please?

maxvint commented 1 year ago

Good morning. Retrieval is still not working. Can you have this fixed please?

@cryptowhizzard Hello, the retrieval problem has been fixed, now it is works well, please take a look.

maxvint commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 2 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage