filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

Victor Chang Cardiac Research Institute #425

Closed DSS-AL closed 1 year ago

DSS-AL commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

The Victor Chang Cardiac Research Institute (VCCRI) is renowned for the quality of its [scientific discoveries](https://www.victorchang.edu.au/heart-research/major-discoveries) and is dedicated to finding cures for cardiovascular disease through world-class and cutting-edge [medical research](https://www.victorchang.edu.au/heart-research).
DSS have worked with VCCRI to develop a PoC to demonstrate the operational and economic benefit of the Filecoin Network and subsequently make this application on their behalf to solve a long-term data storage requirement resulting from their research.
VCCRI are seeking to store five copies of a 1 PiB dataset as an archive on the Filecoin Network.
DSS is a leading decentralised cloud storage provider dedicated to the Filecoin network based in Sydney. DSS operate enterprise scale compute and storage infrastructure in Tier 3 data centres throughout Australia with clients spanning the globe.

What is the primary source of funding for this project?

DSS is funding the project.

What other projects/ecosystem stakeholders is this project associated with?

Client Allocation Request for: Victor Chang Cardiac Research Institute #1937

Use-case details

Describe the data being stored onto Filecoin

The data sets are the original outputs of scientific cardiac research.

Where was the data in this dataset sourced from?

The data sets have been created by large-scale scientific cardiac research.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

DSS do not currently have permission from the client to share the data publicly, although it has the full cooperation from the client to verify data with notaries directly.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

The existing dataset is limited by patient consent and whilst it is deidentified data internal policies and permissions do not currently allow for public use. Once DSS and the broader ecosystem have established a high degree of trust with VCCRI and its governance committees we seek to work with them to enable publicly available datasets that may be of value to the medical research community.

What is the expected retrieval frequency for this data?

The principle use case for the client is archival, thus retrieval is likely limited to twice a year.

For how long do you plan to keep this dataset stored on Filecoin?

Indefinitely

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

The storage deals will be distributed among at least four unique geographies. Certain elements of the data have sovereignty requirements, thus these will be limited to distribution within Australian territories. It is DSSs objective to distributed the datasets amongst the USA and Europe to the extent permissible by the client.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Online deals using Singularity.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

DSS intend distributed data among SPs of enterprise scale with similar sealing capacity and whom operate tier 3 data centres.

How will you be distributing deals across storage providers?

Data that has a sovereignty requirement is intended to be distributed among DSS, Digital Income Fund, Holon and Vigilant IT. Datasets without a sovereignty may well be distributed among peers in other geographies, as these discreet datasets are identified by the client we will engage other SPs in the US and EU.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have the resources/funding to begin making deals once we receive DataCap. 

We currently have the support we need thanks to the help of the Foundation, PL and other members of the community over the last few months.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Kakkouii commented 2 years ago

@DSS-AL Hey, do you have mail with company domain? I can't find any on your website, would you please mail to filplus@fil.org with a company domain mail to verify your identity. Besides, I do understand your data are not currently available for public, but how long it will take to make it public? A rough figure is enough for us.

DSS-AL commented 2 years ago

Hi @EGGRICE02, no problem, email sent. We have ambition to begin making permissible datasets public within 6 months.

MegTei commented 2 years ago

HI Notaries, I have performed DD under NDA with Andrew @DSS who is acting as proxy for the client VCCRI who have an encrypted (private) data set. I have cited email comms about the PoC and been CC'd to the IT sponsor and am satisfied this is authentic.

HI Andrew (@DSS-AL) please you confirm further details:

  1. SP distribution partners and locations
  2. What type of data is it and why it's necessary to be encrypted
DSS-AL commented 2 years ago

Thank you Meg,

  1. The SP distribution is intended as follows (NB: The data has sovereignty requirements that do not allow it to be distributed internationally):

    • 1PiB Digital Income Fund (Sydney)
    • 1PiB Vigilant IT (Sydney)
    • 1PiB Holon (Sydney)
    • 2PiB DSS (Sydney) - Following Seal Storage precedent.
  2. The dataset is initially private as it contains patient data that has not been fully anonymised. We are working with the client to enable parts if not all data to be made public over time.

Best Andrew

DSS-AL commented 2 years ago

Hi Meg, @galen-mcandrew and notaries,

Great news following a call with the client this morning, they have agreed to share all data from published papers as public data only leaving the non-anonymised patient data as encrypted.

I hope this helps the application process. Please let me know if there are any questions.

Andrew

Destore2023 commented 2 years ago

HI Notaries, I have performed DD under NDA with Andrew @dss who is acting as proxy for the client VCCRI who have an encrypted (private) data set. I have cited email comms about the PoC and been CC'd to the IT sponsor and am satisfied this is authentic.

HI Andrew (@DSS-AL) please you confirm further details:

  1. SP distribution partners and locations
  2. What type of data is it and why it's necessary to be encrypted

OK, It's time for filecoin to welcome private dataset. Please count ByteBase in if needed. @MegTei

cryptowhizzard commented 2 years ago

Yes, i guess it is time and this would be a perfect candidate for Fil -E because of the sovereignty requirements.

As long as Holon is keeping the oversight here on this project ( ie. building the dataset for distribution etc. ) you can count us in.

Kevin-PiKNiK commented 2 years ago

This is super cool. Congrats to the DSS team on this fantastic enterprise opportunity with sovereignty requirements. We're hopeful to do something similar in the United States (we have HIPAA issues to overcome) with life sciences, academics, and health systems in the pipeline.

kernelogic commented 2 years ago

I'd like to support this LDN as well, seeing more and more FIL-E style applications nowadays and I want to participate early to get experience of overseeing this type of LDN lifecycle @MegTei .

DSS-AL commented 2 years ago

Thank you @kernelogic this is great news, thank you for your support. More to come just like this one.

DSS-AL commented 2 years ago

Hi @dkkapur please see below a list of notaries that have expressed their intent to support the LDN application.

Fei Yan / Kernelogic / @kernelogic Wijnand Schouten / Speedium / @cryptowhizzard Eric / ByteBase / @swatchliu Meg Dennis / Holon / @MegTei Cabrina Huang / @xingjitansuo

dkkapur commented 2 years ago

@jamerduhgamer are you looking to support this one as well?

@DSS-AL happy to proceed here, though would highly suggest having at least 1-2 more notaries in case folks have issues with signing or are taking time off. This at least gets you some buffer.

NiwanDao commented 2 years ago

I had an offline meeting with @DSS-AL to discuss this application. I am excited to welcome this type of scientific dataset onboard to Filecoin.

  1. @dkkapur Since this dataset can only be distributed in Australia, can the notaries from other regions approve?
  2. what would be the best way to check the encrypted deal is made against the scientific cardiac research dataset?
DSS-AL commented 2 years ago

Thank you for your support @xingjitansuo, I can help answer part 2.

We have a NDA with @MegTei who has had direct communication from the customer will have the ability to verify the data. I could also seek to arrange an NDA with yourself or other notaries if required. I hope this helps.

kernelogic commented 2 years ago

@DSS-AL I prefer all notaries sign NDA and can at least verify some of the data.

DSS-AL commented 2 years ago

Hi @kernelogic in principle I have no problem with that and I'll be happy to circulate NDAs to all relevant notaries. In practice we'll have to get samples from the client as the private components of data are being sent to us pre-encrypted from the client. Regardless, we'll find a way to demonstrate this with total transparency of process.

DSS-AL commented 2 years ago

Hi All, I have circulated a NDA to each of the supporting notaries in Slack. Thanks

dkkapur commented 2 years ago

@xingjitansuo yes. IMO, when notaries from other regions can verify and support a client in the case of private/encrypted data, that should help with building confidence in the client/project. Regardless of where the data is stored, the hope with the LDN process is to enable active notaries and community members to participate in the verification process and minimize the likelihood of DataCap abuse.

dkkapur commented 2 years ago

List of notaries for the multisig that needs to be stood up for this particular project:

Can you give a thumbs up to this comment to confirm this is the correct address and you interested in supporting this client/use case?

dkkapur commented 2 years ago

@DSS-AL - 2 things:

2PiB DSS (Sydney) - Following Seal Storage precedent.

The two replicas only make sense if they are in different locations. Is there a second datacenter where you will host the second 1 PiB replica? There's no upside to using the same SP operator/datacenter from a data loss mitigation standpoint.

Great news following a call with the client this morning, they have agreed to share all data from published papers as public data only leaving the non-anonymised patient data as encrypted.

What's the approx ratio of encrypted vs. public access data in the 1 PiB dataset?

galen-mcandrew commented 2 years ago

Additionally, it would be helpful to use a net new client address for the LDN applications, since this address (f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa) has already received DataCap. It may lead to complications with some of the subsequent allocation bot calculation triggers, since the address would be starting with 50TiB plus the first tranche of LDN DataCap.

DSS-AL commented 2 years ago

Hi @dkkapur,

  1. Yes the two replicas are to be sealed and stored in two separate data centres.
  2. The customer themselves have not yet confirmed the breakdown between public and private yet, however indicated it could be roughly 50:50.
DSS-AL commented 2 years ago

@galen-mcandrew this is a good idea, I'll get Nathanial to create a new client address, however he is currently volunteering with flood relief here in Sydney so it will be tomorrow at the earliest. Thanks

marshyonline commented 2 years ago

@galen-mcandrew - Here is the new address f3qsxziv24i4s7gxc72r53qjnbhmce27uh2wo4h73kchvvb2upygwy4yzikxomqhhiuwcs6fk77y77gmhzs36q

Sunnyiscoming commented 2 years ago

Maybe you should get 2 more notaries' approval on this issue. Three notaries has confirmed that. Cabrina Huang / @xingjitansuo: f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q Meg Dennis / Holon / @MegTei: f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i Eric / ByteBase / @swatchliu: f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi

DSS-AL commented 2 years ago

Hi @dkkapur can you please advise what is needed from our end to progress this application. The client is ready to begin transferring data next week. Thank you

cryptowhizzard commented 2 years ago

@Sunnyiscoming i am behind this one.

Sunnyiscoming commented 2 years ago

Thanks @cryptowhizzard Is there one more notary support this issue?

DSS-AL commented 2 years ago

@Sunnyiscoming please see below the list of supporting notaries provided two weeks ago.

Fei Yan / Kernelogic / @kernelogic: f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa Wijnand Schouten / Speedium / @cryptowhizzard: f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa Eric / ByteBase / @swatchliu: f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi Meg Dennis / Holon / @MegTei: f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i Cabrina Huang / @xingjitansuo: f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Sunnyiscoming commented 2 years ago

OK. Hope to make progress quickly. https://github.com/filecoin-project/notary-governance/issues/573

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

galen-mcandrew commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01885534

Client address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

DataCap allocation requested

50TiB

galen-mcandrew commented 2 years ago

Pinging the 5 notaries working on this application, it is ready for the first allocation.

https://github.com/filecoin-project/notary-governance/issues/577#issuecomment-1190758538

Fei Yan / Kernelogic / @kernelogic: f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa Wijnand Schouten / Speedium / @cryptowhizzard: f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa Eric / ByteBase / @swatchliu: f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi Meg Dennis / Holon / @MegTei: f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i Cabrina Huang / @xingjitansuo: f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

MegTei commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebxsa5v5l6j6sdci3d6qdmnlwhymrkwnobtrrus5ajsfsfyhnj3os

Address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Datacap Allocated

50.00TiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebxsa5v5l6j6sdci3d6qdmnlwhymrkwnobtrrus5ajsfsfyhnj3os

kernelogic commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebw6xy72vmxtqawuq76is7oikuuw3s45thyyrbrlp2wtigz67cotc

Address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Datacap Allocated

50.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebw6xy72vmxtqawuq76is7oikuuw3s45thyyrbrlp2wtigz67cotc

marshyonline commented 2 years ago

Heya all - Just wondering how low we need to get the DC issued to this wallet before the next issue of DC? We have a 40T section of the Dataset that needs to go out and we have ~25T of DC left.

I make note that the DC for this LDN ended up in the same wallet as the POC DC(50T) - f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa I'm happy for it to continue to Goto the address ending in phypa. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/425#issuecomment-1176949609

dkkapur commented 2 years ago

@marshyonline - the automated bot only triggers when you are down to about 25% of your last allocation. If you'd like to initiate a request sooner, can you share a little bit more context on why it's not OK to push the next tranche at 10T remaining? Is there offline coordination happening that requires 40T of deals happen all at once?

marshyonline commented 2 years ago

Hi @dkkapur Thanks for the info.

At this stage, we have ~62T(1950 32G Car's * 32Gi = 62.4T If my math is correct) set to go to the 5 miners for this project. This set could be sealed by all of the miners in ~3-4 days if all of the data cap was available. In this case, I will need to micromanage the deals and only send some every few days as DC rolls in - I feel like this is going to turn this process into a multi-week excessive with a lot of extra operational overhead

Is it possible to extend the next allocation to 200 or 300T to save on the extra back and forth so we can get the data on the chain sooner?

marshyonline commented 2 years ago

Good afternoon @dkkapur

We are basically out of DC but I don't see where or how the next trance has been started. Can you please advise on this process so we can continue getting this dataset on chain? Thanks

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01885534

Client address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

DataCap allocation requested

100TiB

large-datacap-requests[bot] commented 2 years ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01885534

Client address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Last two approvers

kernelogic & megtei

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

100TiB

Total DataCap granted for client so far

32GiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4.99PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
348 3 50TiB 46.32 896GiB
kernelogic commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecihyhobqpddz5xri2inqclmzt3u4uyavniupkp2rg2y3gaydkhvg

Address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Datacap Allocated

100.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecihyhobqpddz5xri2inqclmzt3u4uyavniupkp2rg2y3gaydkhvg

NiwanDao commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecyxreo5cljoxddqk2st4vwkeqpfmxez6vjincia6hbqgwpaf2pnq

Address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Datacap Allocated

100.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecyxreo5cljoxddqk2st4vwkeqpfmxez6vjincia6hbqgwpaf2pnq

marshyonline commented 2 years ago

Hey team, Is it possible to have the next trance of DC @ 200T? We have 40T of staged data to send out as one set to our 5x SP's and i would like to avoid the overhead of splitting up the dealmaking process

kernelogic commented 2 years ago

I don't think it's possible from my experience. It's all bot controlled.

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01885534

Client address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

DataCap allocation requested

200TiB

Id

c563fb4c-8ce6-480f-9f9b-e2c4ce4905eb

large-datacap-requests[bot] commented 2 years ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01885534

Client address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Last two approvers

xingjitansuo & kernelogic

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

200TiB

Total DataCap granted for client so far

150TiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4.85PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
2902 5 100TiB 34.01 24.93TiB