filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - Open datasets onboarding initiative phase 1 (2/4) #1638

Closed kernelogic closed 9 months ago

kernelogic commented 1 year ago

Data Owner Name

Kernelogic

Data Owner Country/Region

Canada

Data Owner Industry

Life Science / Healthcare

Website

https://singularity-browser.kernelogic.ca

Social Media

N/A

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Custom multisig

Identifier

No response

Share a brief history of your project and organization

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I have a web UI (https://singularity-browser.kernelogic.ca/) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

Some of the recent LDNs I completed:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1108
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1107
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1106
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1104
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/983

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

Storage working groups, BigD exchange, singularity deal making tool.

Describe the data being stored onto Filecoin

Because each LDN requires a separate client address in order for the bot to work properly, in order to onboard more data more smoothly, I am kicking off a series of various open dataset onboarding LDNs to onboard new AWS open datasets that I have not done before. Including but not limited to:

Allen Mouse Brain Atlas
Community Earth System Model Large Ensemble (CESM LENS)
Community Earth System Model v2 Large Ensemble (CESM2 LENS)
Epoch of Reionization Dataset
HIRLAM Weather Model
NIH NCBI Sequence Read Archive (SRA) on AWS
NOAA Global Ensemble Forecast System (GEFS)
NOAA Fundamental Climate Data Records (FCDR)
NOAA Joint Polar Satellite System (JPSS)

All these datasets will be indexed for easy lookup through my website https://singularity-browser.kernelogic.ca

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/allen-mouse-brain-atlas/
https://registry.opendata.aws/ncar-cesm-lens/
https://registry.opendata.aws/epoch-of-reionization/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

herrehesse commented 1 year ago

Dear Filecoin+ Github applicant,

We have noticed that some of you are submitting merged datacap requests for datasets that are already (partly) on the chain. While we appreciate your enthusiasm to contribute to the Filecoin network, we want to remind you that this behaviour may not be beneficial to the network in the long run. In fact, this behaviour has been questioned and discussed in issue #832 on the Filecoin notary-governance Github repository.

We encourage you to review the discussions in issue #832. It's important to ensure that your datacap requests are valid, necessary, and add value to the network. By doing so, you can help to maintain the integrity and sustainability of the Filecoin network.

You can find the link to issue #832 here: filecoin-project/notary-governance#832

Thank you for your understanding and cooperation.

kernelogic commented 1 year ago

In my defence I provide a better browser for data indexing per dataset than fil-plus bots. It is capable to show what's being stored in each dataset in detail.

With that being said, I am also willing to follow the decision on your proposal https://github.com/filecoin-project/notary-governance/issues/832 should it get accepted.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

I still prefer one application for one dataset. So I pay attention to the discussion about this topic. I advice you open one by one.

kernelogic commented 1 year ago

@Sunnyiscoming could you check your slack DM please?

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

1TiB

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

DataCap allocation requested

256TiB

Id

c4245414-79e1-4ba1-8584-ec03f51da00f

Sunnyiscoming commented 1 year ago

Related proposal https://github.com/filecoin-project/notary-governance/issues/832 Hope more notaries review this application and comment on this proposal.

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecrnk3cwmznemc7qjq7wqgdkl3dtgkeuskrtqanhhkbvk5pjb2vpi

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

256.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

c4245414-79e1-4ba1-8584-ec03f51da00f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecrnk3cwmznemc7qjq7wqgdkl3dtgkeuskrtqanhhkbvk5pjb2vpi

laurarenpanda commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebxgvbf52ft7mo4ssatskc54uysuh5dkalgy6kx3wicrngyvbr422

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

256.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

c4245414-79e1-4ba1-8584-ec03f51da00f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebxgvbf52ft7mo4ssatskc54uysuh5dkalgy6kx3wicrngyvbr422

kernelogic commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

DataCap allocation requested

512TiB

Id

176125c2-c7c2-4ef1-8693-2c02aca43359

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Rule to calculate the allocation request amount

10% of total dc amount requested

DataCap allocation requested

512TiB

Total DataCap granted for client so far

256TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
5763 3 256TiB 44.47 62.12TiB
sxxfuture-official commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ All storage providers are located in the same region.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

sxxfuture-official commented 1 year ago

It seems that the geographical dispersion of SPs is not enough, but considering that it is the early stage of the project, I hope this problem can be solved in the future, I will support this round and keep an eye on it.

sxxfuture-official commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceaxre3sn6s6lzs3fj76cz5bbbxy42ug7gexp274rpkgt6cef7mbic

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

512.00TiB

Signer Address

f1foiomqlmoshpuxm6aie4xysffqezkjnokgwcecq

Id

176125c2-c7c2-4ef1-8693-2c02aca43359

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaxre3sn6s6lzs3fj76cz5bbbxy42ug7gexp274rpkgt6cef7mbic

kernelogic commented 1 year ago

Thank you for your support. It is a tough time to onboard data now. But I can guarantee the distribution will be satisfactory in the near future.

Also this is 1 out of the 4 LDNs in the same series. They should be taken into consideration as a whole.

SuperChaiChai commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecc77cqwajifuummsqijg2fbuvlxebvjdfnrbszmub66wzqyxl3i6

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

512.00TiB

Signer Address

f12mckci3omexgzoeosjvstcfxfe4vqw7owdia3da

Id

176125c2-c7c2-4ef1-8693-2c02aca43359

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecc77cqwajifuummsqijg2fbuvlxebvjdfnrbszmub66wzqyxl3i6

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

DataCap allocation requested

1PiB

Id

612ddcdb-4f18-4d3b-9483-4bf555d95966

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Rule to calculate the allocation request amount

200% weekly > 1PiB, requesting 1PiB

DataCap allocation requested

1PiB

Total DataCap granted for client so far

465661.3YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

465661.3YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
17869 4 512TiB 64.18 130.68TiB
liyunzhi-666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 50% of total datacap - f02037841: 68.90%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

liyunzhi-666 commented 1 year ago
liyunzhi-666 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebmjuubrvt4v35gajjdvn7w24mehtxrnuiozgorh6j4lkcazysf7o

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

1.00PiB

Signer Address

f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq

Id

612ddcdb-4f18-4d3b-9483-4bf555d95966

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebmjuubrvt4v35gajjdvn7w24mehtxrnuiozgorh6j4lkcazysf7o

Tom-OriginStorage commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceam5mv3wzmqem7tf3tooj7lvsdj7e3dzh3mjif3rdjy7l7qxjhnbo

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

1.00PiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

Id

612ddcdb-4f18-4d3b-9483-4bf555d95966

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceam5mv3wzmqem7tf3tooj7lvsdj7e3dzh3mjif3rdjy7l7qxjhnbo

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 50% of total datacap - f02037841: 66.85%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 1 year ago

Need to keep this open. Still onboarding slowly.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 1 year ago

Need to keep this open. Still onboarding slowly.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

Actively onboarding deals - anticipate renewal this week.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

DataCap allocation requested

2PiB

Id

c4f0d039-a0af-4cd9-836c-19d6f471091c

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

931322574615478927360.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

931322574615478927360.0YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
44098 9 1PiB 36.7 274.28TiB
cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedkowbnv2kyu5dfgsk3xcuxzae65iisbyykobmhnjvrh6t6z6nkt2

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

2.00PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

c4f0d039-a0af-4cd9-836c-19d6f471091c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedkowbnv2kyu5dfgsk3xcuxzae65iisbyykobmhnjvrh6t6z6nkt2

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebt777nlgagcxfslzuuzn5bm7u3h5olhab2ml5bo555knjqezr3ge

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

2.00PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

c4f0d039-a0af-4cd9-836c-19d6f471091c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebt777nlgagcxfslzuuzn5bm7u3h5olhab2ml5bo555knjqezr3ge

xinaxu commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec6vl4kjmwearc6gkv6swb7iq535se7kzewacdnurwmlozm5psvnm

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

2.00PiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

c4f0d039-a0af-4cd9-836c-19d6f471091c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec6vl4kjmwearc6gkv6swb7iq535se7kzewacdnurwmlozm5psvnm

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

Actively onboarding across 4 different LDNs in same series.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.

kernelogic commented 1 year ago

Hi @simonkim0515 could you please reopen this LDN? I accidentally missed the stale notification. My other two open LDNs #1637 and #1639 in the same series are still actively onboarding (down to the last tranche) so this one is needed also. Thank you.

kernelogic commented 1 year ago

Keep it open

Sunnyiscoming commented 1 year ago

Hello, @kernelogic per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

kernelogic commented 1 year ago

Keep it open