filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Allocation] - DataCap for Landsat 9 #995

Closed TaylorOshan closed 1 year ago

TaylorOshan commented 2 years ago

name: Large Dataset Notary application about: Clients should use this application form to request a DataCap allocation via a LDN for a dataset title: "DataCap for Landsat 9" labels: 'application, Phase: Diligence' assignees: ''


Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

The EASIER Data initiative kicked off late this summer and is a two year project in collaboration with the Filecoin Foundation for the Decentralized Web to build pipelines for storing and extracting geospatial data on Filecoin and IPFS. These pipelines will be prototyped and demonstrated using one year of Landsat 9 satellite data, which is estimated at about 500TB.

What is the primary source of funding for this project?

Filecoin Foundation for the Decentralized Web

What other projects/ecosystem stakeholders is this project associated with?

University of Maryland, Filecoin Foundation for the Decentralized Web, Atlas, Filecoin Green

Use-case details

Describe the data being stored onto Filecoin

One year of landsat 9 satellite images.

Where was the data in this dataset sourced from?

Landsat 9 is a joint mission between NASA and USGS

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://www.usgs.gov/landsat-missions/landsat-9

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

The data are supported by open missions and are meant to be public

What is the expected retrieval frequency for this data?

Initially, the data will likely only be retrieved in frequently by the project team, but the hope is that with the development of the pipelines during the duration of the project, the platform will increase the number of users and the retrieval frequency.

For how long do you plan to keep this dataset stored on Filecoin?

3-5 years

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

n/a

How will you be distributing your data to storage providers? Is there an offline data transfer process?

In small increments to accommodate the batch processing of data.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

TBD - we are collaborating with Atlas to develop a strategy for the most robust storage plan

How will you be distributing deals across storage providers?

TBD - we are collaborating with Atlas to develop a strategy for the most robust storage plan

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

We anticipate being ready to start making storage deals in the next 4 to 6 weeks.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

datahug commented 2 years ago

I think it has been storged by lots of projects already.

TaylorOshan commented 2 years ago

I am aware of several projects that have stored some of the Landsat data, but landsat 9 is a new satellite that recently came online and there's not yet one year of data available. I am also aware of other projects that have the goal of storing landsat 9 data (Atlas) and we have been communicating with Eshan Chordia at PL to do this collaboratively. The goal of our FFDW partner project with Brynn O'Donnell is to strategically process and store it so that it is more accessible for retrieval for downstream users.

dannyob commented 2 years ago

@Datadaos I think you're thinking of Landsat 8, which was part of Slingshot. This is a new slab of data.

@TaylorOshan Do you have any idea yet of how many SPs you were thinking of storing the data with?

TaylorOshan commented 2 years ago

@dannyob we would probably first look to identify one or two larger storage providers and we are currently working on making those connections, but then we were thinking to try to work with several smaller providers for additional replications.

raghavrmadya commented 2 years ago

Datacap Request Trigger

Total DataCap requested

500PiB

Expected weekly DataCap usage rate

50TiB

Client address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

DataCap allocation requested

25TiB

TaylorOshan commented 2 years ago

@raghavrmadya @dannyob just saw these latest comments and I have been meaning to provide some updates to the issue based on project conversations. I believe I originally misunderstood the amount I should be requesting. Originally provided the anticipated size of the data set, but have since realized that we would be planning to store several applications using two different file formats. We also plan to explore different strategies to pack CAR files, but following conversations with @dannyob and others, we have thought of some strategies to first explore this on a smaller scale. Since we are using one year of data that we expect to be about 500 TB and we are planning to store two version (tiff and cog) each with 3 replications, the total request should actually be closer 3 PB. Does this matter up front or can we continue working and expand the request as we go?

raghavrmadya commented 2 years ago

Hi, thanks for the update. please feel free to open a new application with the remaining request

TaylorOshan commented 2 years ago

Ok , Will do. Thanks for the clarification. One additional question. What was the total amount approved for this request? Trying to understand if I did incorrect and we received enough for replications were just a single copy.

TaylorOshan commented 2 years ago

Hi @raghavrmadya, just checking in. When I check to see if we have received the first tranche using https://verify.glif.io/ I don't yet see the 25 TiB.

kernelogic commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebqgdeeodugy3sja5qbqkv5ahlfx6tyf7aq2a6xarw3ndrr5fguh6

Address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

Datacap Allocated

25.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebqgdeeodugy3sja5qbqkv5ahlfx6tyf7aq2a6xarw3ndrr5fguh6

xinaxu commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebnoe2n54wkigypskm2loayzkalojahwtyy4qxy5mm3q3p3nylot2

Address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

Datacap Allocated

25.00TiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebnoe2n54wkigypskm2loayzkalojahwtyy4qxy5mm3q3p3nylot2

Sunnyiscoming commented 1 year ago

Is there any problem with using datacap?

TaylorOshan commented 1 year ago

We are working with Piknik storage provider to finish packing the data for storage. I will check in with them to understand the timeline and when we will begin making storage deals.

On Sun, Jan 29, 2023 at 5:55 AM Sunnyiscoming @.***> wrote:

Is there any problem with using datacap?

— Reply to this email directly, view it on GitHub https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/995#issuecomment-1407629285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB553TPYZNJ3XQWFKSZNKRTWUZEAXANCNFSM6AAAAAAQRH4ROM . You are receiving this because you were mentioned.Message ID: <filecoin-project/filecoin-plus-large-datasets/issues/995/1407629285@ github.com>

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

DataCap allocation requested

50TiB

Id

7389867f-91cc-4953-aceb-85d9562e66c9

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

Last two approvers

xinaxu & kernelogic

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

50TiB

Total DataCap granted for client so far

127.39TiB

Datacap to be granted to reach the total amount requested by the client (500TiB)

372.60TiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
3358 1 25TiB 100 4.48TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f01851060: 100.00%

⚠️ All storage providers are located in the same region.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

jamerduhgamer commented 1 year ago

Hello all, providing this comment for full transparency here. PiKNiK started sealing first to get past the smaller datacap 50 and 100 TiB allocations.

Another replica will be going to Australia but the SP is stilling setting up to take on the data.

2 other copies have been posted to the Big Data Exchange for an SP in Asia excluding GCN and an SP in Europe so we will have more SPs onboarded once those auctions are won along with more geographic distribution as well.

TaylorOshan commented 1 year ago

Just wanted to loop in @dannyob for situational awareness as this moves forward. Would the next status updates/tranche of datacap be granted once we continue dealmaking or do we need to wait for that to happen before we can start dealmaking again with additional SPs and finish the first copy with Piknik?

NiwanDao commented 1 year ago

Hope to see more diverse distribution in the next tranche.

NiwanDao commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecrebpkiral2pf5bwjd45acokgbah7wus2h6scykswxwvusqr353u

Address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

Datacap Allocated

50.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

7389867f-91cc-4953-aceb-85d9562e66c9

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecrebpkiral2pf5bwjd45acokgbah7wus2h6scykswxwvusqr353u

dannyob commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec5p6kx4rhab4up3d6j7hb3gmkaucr4qxzkaaum2kjwjocwirkwcy

Address

f1uwzfw6hghqf6js4773p62onzvqnupcqdxbkhhvq

Datacap Allocated

50.00TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec5p6kx4rhab4up3d6j7hb3gmkaucr4qxzkaaum2kjwjocwirkwcy

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed more than 70% of total datacap - f01851060: 78.50%

⚠️ 1 storage providers sealed too much duplicate data - f01392893: 37.77%

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

James has requested my attention to review this LDN. This collaborative project with FFDW is currently in its initial phase of distribution. We will ask @TaylorOshan for distribution plans regarding the upcoming phase. I am willing to provide support, but I will closely monitor the progress during this next round.

jamerduhgamer commented 1 year ago

Thank you @cryptowhizzard. Messaged @raghavrmadya on why the next datacap tranche is not firing off like previously. I believe this issue has already been flagged by Raghav though.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

TaylorOshan commented 1 year ago

Could we please keep this application open? @jamerduhgamer was working with partner SPs to get additional copies stored.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

jamerduhgamer commented 1 year ago

Hi @TaylorOshan, that is correct and there is currently an issue with getting more datacap approved because the bot is not triggering. Confirming with @simonkim0515 or @panges2 if we need to open another application to resolve this issue.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

large-datacap-requests[bot] commented 11 months ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided We could not find Data Type of Application field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
aggregation-and-compliance-bot[bot] commented 11 months ago
Client f01905142 does not follow the datacap usage rules. More info here. This application has been failing the requirements for 7 days. Please take appropiate action to fix the following DataCap usage problems. Criteria Treshold Reason
Percent of used DataCap stored with top provider < 75 The percent of Data from the client that is stored with their top provider is 100%. This should be less than 75%