filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - Open datasets onboarding initiative phase 1 (2/4) #1638

Closed kernelogic closed 9 months ago

kernelogic commented 1 year ago

Data Owner Name

Kernelogic

Data Owner Country/Region

Canada

Data Owner Industry

Life Science / Healthcare

Website

https://singularity-browser.kernelogic.ca

Social Media

N/A

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Custom multisig

Identifier

No response

Share a brief history of your project and organization

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I have a web UI (https://singularity-browser.kernelogic.ca/) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

Some of the recent LDNs I completed:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1108
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1107
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1106
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1104
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/983

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

Storage working groups, BigD exchange, singularity deal making tool.

Describe the data being stored onto Filecoin

Because each LDN requires a separate client address in order for the bot to work properly, in order to onboard more data more smoothly, I am kicking off a series of various open dataset onboarding LDNs to onboard new AWS open datasets that I have not done before. Including but not limited to:

Allen Mouse Brain Atlas
Community Earth System Model Large Ensemble (CESM LENS)
Community Earth System Model v2 Large Ensemble (CESM2 LENS)
Epoch of Reionization Dataset
HIRLAM Weather Model
NIH NCBI Sequence Read Archive (SRA) on AWS
NOAA Global Ensemble Forecast System (GEFS)
NOAA Fundamental Climate Data Records (FCDR)
NOAA Joint Polar Satellite System (JPSS)

All these datasets will be indexed for easy lookup through my website https://singularity-browser.kernelogic.ca

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/allen-mouse-brain-atlas/
https://registry.opendata.aws/ncar-cesm-lens/
https://registry.opendata.aws/epoch-of-reionization/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

Keep it open

github-actions[bot] commented 12 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

Sunnyiscoming commented 11 months ago

Please provide ID, City, Country, Organization of each SP here.

kernelogic commented 11 months ago

keepalive

large-datacap-requests[bot] commented 11 months ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

DataCap allocation requested

1.25PiB

Id

97a1c1ef-fc0b-48ba-a86d-8cd3e0dc14cf

kernelogic commented 11 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 11 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 75.64% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

kernelogic commented 11 months ago
ID City Country SP
f02191897 Hong Kong China Sageone
f01969779 Boardman US Nick
f02201190 San Jose US James
f02131881 Hong Kong China Chris
f02131801 Hong Kong China Chris
f02131855 Hong Kong China Chris
f0240185 Clifton US TopBlocks
f0143858 Clifton US TopBlocks
f02824134 Dallas US GreaterHeat
f02037841 Dallas US Hofe Ding
f02216186 Dallas US Hofe Ding
f02383642 Dallas US Alex Li
f02368751 Dallas US Alex Li
f02383649 Dallas US SJX
f01518369 Sunnyvale US Chris
a1991car commented 11 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebudttu7ehb5bdjqunrcv3xbvdl3aeyexi2av2xpy3fpzizbqcii4

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

1.25PiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

97a1c1ef-fc0b-48ba-a86d-8cd3e0dc14cf

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebudttu7ehb5bdjqunrcv3xbvdl3aeyexi2av2xpy3fpzizbqcii4

nj-steve commented 11 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceafzyhyzyf3suenk5lnft73a3xtq4k752mhkcycd5osyso4sjcov2

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

1.25PiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

97a1c1ef-fc0b-48ba-a86d-8cd3e0dc14cf

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceafzyhyzyf3suenk5lnft73a3xtq4k752mhkcycd5osyso4sjcov2

cryptowhizzard commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebkmcxb4vzrdynn6kyfcjzi6fwpraoksl7vfmc4h2xceckwx6uuyq

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

1.25PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

97a1c1ef-fc0b-48ba-a86d-8cd3e0dc14cf

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebkmcxb4vzrdynn6kyfcjzi6fwpraoksl7vfmc4h2xceckwx6uuyq

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 10 months ago

keepalive

kernelogic commented 10 months ago

keepalive

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 10 months ago

keepalive

github-actions[bot] commented 9 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 9 months ago

I need to keep this open.

NiwanDao commented 9 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedfqdl2tj23mw62rj6laewt5prt6s5wzwql4fugfe4kpwq6rmbecq

Address

f1rylwniokpxpziavwvtvf7qgbj6p23iqgfu26iea

Datacap Allocated

1.25PiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

97a1c1ef-fc0b-48ba-a86d-8cd3e0dc14cf

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedfqdl2tj23mw62rj6laewt5prt6s5wzwql4fugfe4kpwq6rmbecq

large-datacap-requests[bot] commented 9 months ago

The issue reached the total datacap requested. This should be closed

github-actions[bot] commented 9 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.