filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] <Arrows> - <Energy Database> #1225

Closed fand-ee closed 1 year ago

fand-ee commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

ARROWS was established in July 2006, and its main business is as follows:
・ Housing business. (sales / construction / maintenance work such as sales outsourcing, solar power generation system, storage battery system, HEMS, all-electric, EV charger, remodeling, etc.).
・ Material procurement. (solar power generation system, storage battery system, HEMS, all-electric, EV charger, etc. wholesale sales)
・ Business development. (biomass business, comprehensive sales of housing materials)
![image](https://user-images.githubusercontent.com/93598242/200982944-a635c983-231d-48cf-9bbb-c391b809ff1c.png)

What is the primary source of funding for this project?

Company's own assets.

What other projects/ecosystem stakeholders is this project associated with?

There is no others.

Use-case details

Describe the data being stored onto Filecoin

National Solar Radiation Database (NSRDB) is a serially complete collection of hourly and half-hourly values of the three most common measurements of solar radiation – global horizontal, direct normal, and diffuse horizontal irradiance — and meteorological data. These data have been collected at a sufficient number of locations and temporal and spatial scales to accurately represent regional solar radiation climates.

Where was the data in this dataset sourced from?

AWS (https://registry.opendata.aws/nrel-pds-nsrdb/)

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://nsrdb.nrel.gov/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

License: Creative Commons Attribution 3.0 United States License.

What is the expected retrieval frequency for this data?

Anytime.

For how long do you plan to keep this dataset stored on Filecoin?

Over 540 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Mainly in Japan, other area in Asia is also acceptable.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Mainly transferred online, offline transfer can be considered if the storage provider is in Japan.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will need reliable storage providers to ensure that our data can be stored for a long time and also can be retrieved at any time.

How will you be distributing deals across storage providers?

The amount of data allocated will be determined by the qualifications of the storage provider, such as the stability of the equipment they provide, the amount pledged, etc.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Company assets can cover all expenses.

PS. We have submitted this application before, but we've lost our previous GitHub account due to some problems and the application process was interrupted, so we had to resubmit the application again. We would appreciate it if you would reconsider our application. The screenshot is a record of the last application, and the link to the previous application (which is no longer available due to account issues): https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/101

arrows arrows2

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

@fand-ee it seems that this ID was allocated 50TiB already. Can you change the total amount of DataCap being requested to 4.5PiB, however If you would like the additional 5PiB on top of the 50TiB allocated, please explain why you need the extra 5PiB.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

fand-ee commented 1 year ago

@fand-ee it seems that this ID was allocated 50TiB already. Can you change the total amount of DataCap being requested to 4.5PiB, however If you would like the additional 5PiB on top of the 50TiB allocated, please explain why you need the extra 5PiB.

@simonkim0515 I've changed it to 4.95PiB. 5PiB-50TiB=4.95PiB, right?

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

4.95PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

DataCap allocation requested

250TiB

Id

77941bbd-8543-4947-8558-44616c83d9a1

kernelogic commented 1 year ago

I have done the same dataset and I will verify it after first allocation.

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced5dnrwrzt57jxuxi3l2rwpozp5a2gpnsynr5dfvbzc3esxpzqtec

Address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

Datacap Allocated

100.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

77941bbd-8543-4947-8558-44616c83d9a1

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced5dnrwrzt57jxuxi3l2rwpozp5a2gpnsynr5dfvbzc3esxpzqtec

fand-ee commented 1 year ago

@kernelogic Thanks for your support. @simonkim0515 In addition, I would like to ask why the allocation requested is 250TiB, but the Datacap allocated is 100TiB? Is there any bug in the system?

Joss-Hua commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebespf3x22vinmz6ffe272qcytgwclcdid554gvcvt6pgs3v57lsa

Address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

Datacap Allocated

250.00TiB

Signer Address

f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi

Id

77941bbd-8543-4947-8558-44616c83d9a1

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebespf3x22vinmz6ffe272qcytgwclcdid554gvcvt6pgs3v57lsa

Joss-Hua commented 1 year ago

I have preliminarily investigated this client in #101. Agreed to this allocation, and further reviewed it before the next allocation.

Joss-Hua commented 1 year ago

image

anomalous, "Datacap Allocated" is inconsistent with the actual allocated quantity @galen-mcandrew @dkkapur

psh0691 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea7hlgj2fruotbrlcquofjhkdgrzmrphyy5vzmmcbdqj2lujtu4ig

Address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

Datacap Allocated

250.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

77941bbd-8543-4947-8558-44616c83d9a1

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea7hlgj2fruotbrlcquofjhkdgrzmrphyy5vzmmcbdqj2lujtu4ig

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01228089 has sealed 30.00% of total datacap.

⚠️ 54.58% of total deal sealed by f01228089 are duplicate data.

⚠️ f01228100 has sealed 30.00% of total datacap.

⚠️ 54.58% of total deal sealed by f01228100 are duplicate data.

⚠️ 31.87% of total deal sealed by f01228087 are duplicate data.

⚠️ 31.87% of total deal sealed by f01228105 are duplicate data.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01228089 Frankfurt am Main, Hesse, DE 15.00 TiB 30.00% 6.81 TiB 54.58%
f01228100 San Jose, California, US 15.00 TiB 30.00% 6.81 TiB 54.58%
f01228087 London, England, GB 10.00 TiB 20.00% 6.81 TiB 31.87%
f01228105 Hong Kong, Central and Western, HK 10.00 TiB 20.00% 6.81 TiB 31.87%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
6.81 TiB 50.00 TiB 4 100.00%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f3qaipvxrz2gxexc7mcvjsxifmscfiw7c7zfhrmq7
6j5ee4hcbvg3gbtpea7wgz72kkjcjmwzhm5uo2onx
yocq
Guazi Dynamic 27.25 TiB 218 LDN v3 multisig

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

DataCap allocation requested

500TiB

Id

1852d20c-f650-4645-99c6-66ee24fc3d04

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

Last two approvers

psh0691 & Joss-Hua

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

500TiB

Total DataCap granted for client so far

400TiB

Datacap to be granted to reach the total amount requested by the client (4.95PiB)

4.55PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
13573 9 250TiB 14.35 58.60TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ f01984580 has unknown IP location.

⚠️ f01993388 has unknown IP location.

⚠️ 22.31% of total deal sealed by f01228100 are duplicate data.

⚠️ 22.31% of total deal sealed by f01228089 are duplicate data.

⚠️ f01993339 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01984580 Unknown
Unknown
39.99 TiB 12.52% 39.99 TiB 0.00%
f0522948 Singapore, Singapore, SG
Alibaba (US) Technology Co., Ltd.
39.96 TiB 12.51% 39.96 TiB 0.00%
f01993388new Unknown
Unknown
38.18 TiB 11.95% 38.18 TiB 0.00%
f0867300 Tokyo, Tokyo, JP
Alibaba (US) Technology Co., Ltd.
37.96 TiB 11.88% 37.96 TiB 0.00%
f01228087 London, England, GB
Alibaba (US) Technology Co., Ltd.
37.39 TiB 11.70% 34.20 TiB 8.52%
f01228100 San Jose, California, US
Alibaba (US) Technology Co., Ltd.
36.70 TiB 11.49% 28.52 TiB 22.31%
f01228089 Frankfurt am Main, Hesse, DE
Alibaba (US) Technology Co., Ltd.
36.70 TiB 11.49% 28.52 TiB 22.31%
f01228105 Hong Kong, Central and Western, HK
Alibaba (US) Technology Co., Ltd.
31.70 TiB 9.92% 28.52 TiB 10.05%
f01993339 Unknown
Unknown
20.87 TiB 6.53% 20.87 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
9.19 TiB 9.19 TiB 1 2.88%
37.04 TiB 74.07 TiB 2 23.19%
2.20 TiB 6.61 TiB 3 2.07%
29.20 TiB 139.56 TiB 4 43.69%
18.01 TiB 90.04 TiB 5 28.18%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f3qaipvxrz2gxexc7mcvjsxifmscfiw7c7zfhrmq7
6j5ee4hcbvg3gbtpea7wgz72kkjcjmwzhm5uo2onx
yocq
Guazi Dynamic 27.25 TiB 218 1cryptowhizzard
6kernelogic
1liyunzhi-666
2newwebgroup
1psh0691

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

Hi @fand-ee

I tried to retrieve some data but got stuck after this:

root@proposals:~# lotus client retrieve --provider f01228087 QmXhz91P5rJD9JBiuWp5bVwcn5m6h7vvTGigB2ARyaUmLk test.car Recv 0 B, Paid 0 FIL, Open (New), 0s [1676445845934082940|0] Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 3ms [1676445845934082940|0]

Is f01228087 open for retrieval?

fand-ee commented 1 year ago

Hi @cryptowhizzard f01228087 is open for retrieval. It may be slow due to network problems, please try again.

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedapdts7da5hj4p3mplg6ramnjwe3swb2gysdeh6adzrqiow7426m

Address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

Datacap Allocated

500.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

1852d20c-f650-4645-99c6-66ee24fc3d04

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedapdts7da5hj4p3mplg6ramnjwe3swb2gysdeh6adzrqiow7426m

cryptowhizzard commented 1 year ago

@fand-ee

Thanks, retrieval worked.

Joss-Hua commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01228089: 22.31%, f01228100: 22.31%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

nj-steve commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01228089: 22.31%, f01228100: 22.31%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

nj-steve commented 1 year ago

@fand-ee hello, please tell me why did shared some data CID with other client ?

fand-ee commented 1 year ago

Hi @nj-steve It's possible that we used the same AWS public dataset and the same data splitting tool, which may have led to this issue occurring with a small probability. This is inevitable, but we will try to find ways to reduce the occurrence of such situations in the future.

nj-steve commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb5jk7vyl2odcwzgtjxzgc7kmwpaew6vcefh2mtv32otvl3np7frm

Address

f1qlw5qik62kvrzvpa7bsst65uobtt3jmkrh3ajsq

Datacap Allocated

500.00TiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb5jk7vyl2odcwzgtjxzgc7kmwpaew6vcefh2mtv32otvl3np7frm

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01228089: 22.31%, f01228100: 22.31%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

spaceT9 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01228089: 22.31%, f01228100: 22.31%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

spaceT9 commented 1 year ago

your retrieval rate is too low and it needs to be increased.

fand-ee commented 1 year ago

@spaceT9 I have asked SPs to support retrieval, because the retrieval rate is a cumulative value and will gradually increase with the number of tests. We will terminate cooperation with SPs that have been unable to provide retrieval services

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

fand-ee commented 1 year ago

Waiting for signature

cryptowhizzard commented 1 year ago

Waiting for signature

You should make sure your SP's enable retrieval before you get next allocation of datacap. Retrieval is mandatory according to Fil+ rules and guidelines.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

fand-ee commented 1 year ago

......

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.

clriesco commented 1 year ago

Removed stale label, reopened issue :)

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

fand-ee commented 1 year ago

Hi

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.