filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] National Herbarium of NSW #2067

Closed Hugh-Top closed 11 months ago

Hugh-Top commented 1 year ago

Data Owner Name

Royal Botanic Gardens and Domain Trust

What is your role related to the dataset

Storage provider filling out application on behalf of the data owner

Data Owner Country/Region

United States

Data Owner Industry

Not-for-Profit

Website

https://www.rbgsyd.nsw.gov.au/science/national-herbarium-of-new-south-wales

Social Media

N/A

Total amount of DataCap being requested

1PiB

Expected size of single dataset (one copy)

160TiB

Number of replicas to store

10

Weekly allocation of DataCap requested

100TiB

On-chain address for first allocation

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

No response

Share a brief history of your project and organization

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1974

We are small filecoin team. 
We have a node f01969779.
We will continue to participate in filecoin and seal more data.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

The National Herbarium of New South Wales is one of the most significant scientific, cultural and historical botanical resources in the Southern hemisphere. The 1.43 million preserved plant specimens have been captured as high-resolution images and the biodiversity metadata associated with each of the images captured in digital form. Botanical specimens date from year 1770 to today, and form voucher collections that document the distribution and diversity of the world's flora through time, particularly that of NSW, Austalia and the Pacific.The data is used in biodiversity assessment, systematic botanical research, ecosystem conservation and policy development. The data is used by scientists, students and the public.

aws s3 ls --no-sign-request --recursive --human-readable --summarize s3://herbariumnsw-pds/ | grep "Total Size:"
   Total Size: 103.9 TiB

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

aws s3 ls --no-sign-request s3://herbariumnsw-pds/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

Less than 1 year

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

sp region
f02145020 CN
f02301 US
f03223 US
f01969779(our) US
f020522 DE
f02093396 Singapore

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

How many percentage of datacap will you store in your nodes?

Sunnyiscoming commented 1 year ago

What is your role at the company that is behind this project? How are you connected to the data set? The website isn't yours. Do you work for the listed organization? How are you finding SPs. List a detailed plan.

Hugh-Top commented 1 year ago

How many percentage of datacap will you store in your nodes?

@Sunnyiscoming I only have one node f01969779 that will store one copy dataset(less than 20%). I promise each sp will store not more than 30%. I am looking for other sps.

think you.

Hugh-Top commented 1 year ago

What is your role at the company that is behind this project? How are you connected to the data set? The website isn't yours. Do you work for the listed organization? How are you finding SPs. List a detailed plan.

@Sunnyiscoming hello. This dataset is sourced from https://registry.opendata.aws/nsw-herbarium/ I am not affiliated with this organization in anyway. However this dataset is CC licensed public open dataset, anyone can store it. I believe storing this dataset on filecoin network has its value.

We already have some cooperative sps, we will continue to look for new sps through slack. I promise each sp will store not more than 30%.

think you.

Sunnyiscoming commented 1 year ago

Is this storage node operated by your company? What's the name of your company? What is your role in the company?

Hugh-Top commented 1 year ago

Is this storage node operated by your company? What's the name of your company? What is your role in the company?

@Sunnyiscoming We did not set up a company. We are a few friends who operate and participate in filecoin. I am mainly responsible for operation and maintenance.

think you

herrehesse commented 1 year ago

@Hugh-Top can you show me retrievability on your selected SP's?

sp region
f02145020 CN
f02301 US
f03223 US
f01969779(our) US
f020522 DE
f02093396 Singapore
Hugh-Top commented 1 year ago

https://github.com/data-preservation-programs/filplus-checker-assets/blob/main/filecoin-project/filecoin-plus-large-datasets/issues/1974/1688089477013.md

image

https://github.com/data-preservation-programs/filplus-checker-assets/blob/main/filecoin-project/filecoin-plus-large-datasets/issues/1974/1688089477013.md

image

cryptowhizzard commented 1 year ago

Hi @Hugh-Top

I see SP's from topblocks in your application. Although as applicant you are allowed to store one copy of your data, i wonder who the 3 other applicants are to store the other 3 replica's of this dataset?

Hugh-Top commented 1 year ago

Hi @Hugh-Top

I see SP's from topblocks in your application. Although as applicant you are allowed to store one copy of your data, i wonder who the 3 other applicants are to store the other 3 replica's of this dataset?

@cryptowhizzard topblocks is our partner. Is it necessary to provide the name of the sp?

sp region org
f02145020 CN harry
f02301 US topblocks
f03223 US topblocks
f01969779 US our
f020522 DE phantom
f02093396 Singapore STRAITDEER PTE. LTD.
Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

1 PiB

Expected weekly DataCap usage rate

100 TiB

Client address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

DataCap allocation requested

50TiB

Id

f079365e-cf87-437e-af0c-c763204f1f66

ipollo00 commented 1 year ago

@Hugh-Top Does your cooperate sps receive your data?

Hugh-Top commented 1 year ago

@Hugh-Top Does your cooperate sps receive your data?

@ipollo00 yes

Fatman13 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceckmghbj64w6by55o7owvoi4rvsso5z5hfpaii74w2lpqd6j6e3g6

Address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

Datacap Allocated

50.00TiB

Signer Address

f1j3u7crhjzwb2cj5mq7vodlt4o66yoyci7lhcauy

Id

f079365e-cf87-437e-af0c-c763204f1f66

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceckmghbj64w6by55o7owvoi4rvsso5z5hfpaii74w2lpqd6j6e3g6

Fatman13 commented 1 year ago

REached out by the client on Slack. Application and allocation plan looks okay. Will support 1st round.

ipollo00 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebbemmyzubbbbvsevyg52d25foopzq7kaexokinofzmnvr7blunvq

Address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

Datacap Allocated

50.00TiB

Signer Address

f1n5wlrrhoxpkgwij25xrtt7w7g2k3fhbthmdn6ri

Id

f079365e-cf87-437e-af0c-c763204f1f66

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebbemmyzubbbbvsevyg52d25foopzq7kaexokinofzmnvr7blunvq

cryptowhizzard commented 1 year ago

Hi,

I would advise notaries not to sign on this LDN. We will follow up with information on short notice about the where / what / why. If urgent please contact us in the T&T group.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

Hugh-Top commented 1 year ago

continue

ghost commented 1 year ago

checker:manualTrigger

ghost commented 1 year ago

@

REached out by the client on Slack. Application and allocation plan looks okay. Will support 1st round.

@Fatman13 please keep all communication public if possible. Thank you

ghost commented 1 year ago

Hello @Hugh-Top per the new guidelines https://github.com/filecoin-project/notary-governance/issues/922 for Open Dataset applicants, we are asking to complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity toward the Fil+ guideline of a distributed storage plan and then the application will be approved for notary review. Let us know if you have any questions.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

Hugh-Top commented 1 year ago

continue

ghost commented 1 year ago

checker:manualTrigger

Hugh-Top commented 1 year ago

Hello @Hugh-Top per the new guidelines filecoin-project/notary-governance#922 for Open Dataset applicants, we are asking to complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity toward the Fil+ guideline of a distributed storage plan and then the application will be approved for notary review. Let us know if you have any questions.

Form has been submitted. Please check.

ghost commented 1 year ago

Confirming SPs received as Entities storing: f02032191 zhejiang(CN) f02230375 HongKong(CN) f02230941 HongKong(CN) f02230939 HongKong(CN) f02230935 HongKong(CN) f01836766 Guangzhou(CN) f02093396 STRAITDEER PTE. LTD Singapore f02301 top blocks SantaClara(US) f03223 top blocks SantaClara(US) f0143858 top blocks SantaClara(US) f0240185 top blocks SantaClara(US)

cryptowhizzard commented 1 year ago

I am missing the names of these SP's : f02032191 zhejiang(CN) f02230375 HongKong(CN) f02230941 HongKong(CN) f02230939 HongKong(CN) f02230935 HongKong(CN) f01836766 Guangzhou(CN)

As you will be storing 10 replica's per your LDN description, what will the distribution look like and how will you adhere to one replica per organisation this time?

cryptowhizzard commented 1 year ago

Btw,

f02230375 HongKong(CN) f02230941 HongKong(CN) f02230939 HongKong(CN) f02230935 HongKong(CN)

These are all running on one single IP address space and probably belong to / are controlled by one entity?

Hugh-Top commented 1 year ago

I will ensure that the proportion of dc allocated to each sp does not exceed 30%, and I am actively seeking more sps to collaborate with. Additionally, I have not come across any regulations specifying that each storage provider can only maintain a single copy.

f02230375, f02230941, f02230939, and f02230935 belong to the same storage provider.

image

thinks

kevzak commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

kevzak commented 1 year ago

Original List from above: f02032191 zhejiang(CN) f02230375 HongKong(CN) f02230941 HongKong(CN) f02230939 HongKong(CN) f02230935 HongKong(CN) f01836766 Guangzhou(CN) f02093396 STRAITDEER PTE. LTD Singapore f02301 top blocks SantaClara(US) f03223 top blocks SantaClara(US) f0143858 top blocks SantaClara(US) f0240185 top blocks SantaClara(US)

CID REPORT: f02230941 | Sham Shui Po, Sham Shui Po, HKBIH-Global Internet Harbor | 3.13 TiB | 9.94% | 3.13 TiB | 0.00% f02230375 | Sham Shui Po, Sham Shui Po, HKBIH-Global Internet Harbor | 3.06 TiB | 9.74% | 3.06 TiB | 0.00% f02230935 | Sham Shui Po, Sham Shui Po, HKBIH-Global Internet Harbor | 32.00 GiB | 0.10% | 32.00 GiB | 0.00% f02032191 | Jiaxing, Zhejiang, CNCHINA UNICOM China169 Backbone | 13.38 TiB | 42.54% | 13.38 TiB | 0.00% f02093396 | San Francisco, California, USCloudflare, Inc. | 11.84 TiB | 37.67% | 11.84 TiB

f02093396 is located where @Hugh-Top ? said it was Singapore, shows San Francisco

kevzak commented 1 year ago

Also confirming that your node f01969779 is no longer involved here? You had included it before: LINK

What is your role with Dataset then? Did you prepare it?

Hugh-Top commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

KizenYang commented 1 year ago

Original List from above: f02032191 zhejiang(CN) f02230375 HongKong(CN) f02230941 HongKong(CN) f02230939 HongKong(CN) f02230935 HongKong(CN) f01836766 Guangzhou(CN) f02093396 STRAITDEER PTE. LTD Singapore f02301 top blocks SantaClara(US) f03223 top blocks SantaClara(US) f0143858 top blocks SantaClara(US) f0240185 top blocks SantaClara(US)

CID REPORT: f02230941 | Sham Shui Po, Sham Shui Po, HKBIH-Global Internet Harbor | 3.13 TiB | 9.94% | 3.13 TiB | 0.00% f02230375 | Sham Shui Po, Sham Shui Po, HKBIH-Global Internet Harbor | 3.06 TiB | 9.74% | 3.06 TiB | 0.00% f02230935 | Sham Shui Po, Sham Shui Po, HKBIH-Global Internet Harbor | 32.00 GiB | 0.10% | 32.00 GiB | 0.00% f02032191 | Jiaxing, Zhejiang, CNCHINA UNICOM China169 Backbone | 13.38 TiB | 42.54% | 13.38 TiB | 0.00% f02093396 | San Francisco, California, USCloudflare, Inc. | 11.84 TiB | 37.67% | 11.84 TiB

f02093396 is located where @Hugh-Top ? said it was Singapore, shows San Francisco

Hello, @kevzak .I am a team member from the storage provider with the ID f02093396, and I'm here to address this question.

Previously, with the aim of enhancing security, we introduced Cloudflare, which allowed users to be automatically directed to the optimal edge node when accessing the platform. However, due to this setup, the actual geographic location of users might have been displayed as the US region, which does not match the physical address.

To better maintain consistency between the actual location and the physical address, we are currently undergoing security policy adjustments. As a result, in the near future, the retrieved user location will accurately be displayed as Singapore. We anticipate completing these adjustments by August 15th, 2023, ensuring that users receive geographical information that is more closely in line with reality.

Hugh-Top commented 1 year ago

Also confirming that your node f01969779 is no longer involved here? You had included it before: LINK

What is your role with Dataset then? Did you prepare it?

My node not seal this temporarily, and will continue to seal later.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

DataCap allocation requested

100TiB

Id

e2d9685e-f24b-44d1-ace7-96ce2e1aab60

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

100TiB

Total DataCap granted for client so far

50TiB

Datacap to be granted to reach the total amount requested by the client (1PiB)

974TiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
1552 7 50TiB 48.78 640GiB
Hugh-Top commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

NiwanDao commented 1 year ago

LGTM. I will support this time.

NiwanDao commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacednvcnpv2o6zpauowmw5eb77m4jjiliuvp5ugbzn4failfcddxct2

Address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

Datacap Allocated

100.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

e2d9685e-f24b-44d1-ace7-96ce2e1aab60

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacednvcnpv2o6zpauowmw5eb77m4jjiliuvp5ugbzn4failfcddxct2

laurarenpanda commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

laurarenpanda commented 1 year ago

I have checked previous comments and the Checker report. Willing to support this round.

laurarenpanda commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced6st4erm2w2flhnfuggvrxjdvbdsnzcnxbqfkzkeeivzxw6zhcoe

Address

f1nrnyrttz53iau77m7sbk56pxij5g7mmi4afk6pq

Datacap Allocated

100.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

e2d9685e-f24b-44d1-ace7-96ce2e1aab60

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced6st4erm2w2flhnfuggvrxjdvbdsnzcnxbqfkzkeeivzxw6zhcoe