filecoin-project / notary-governance

115 stars 58 forks source link

Proposal: Project Antarctic - 10 PiB Data Set / 50 PiB DataCap #489

Closed scharfstein closed 1 year ago

scharfstein commented 2 years ago

For the purposes of community transparency, Seal Storage Technology would like to present a collaborative plan for onboarding a total of 50 PiB to the Filecoin Network. This will represent 5 full replicas of a 10 PiB data set. We would like to present details here about the project and plan to attend the Tuesday, April 5, 2022 Governance Call to discuss with the community.

Project Description

The project is a 10 PiB project to prove out the value propositions of decentralized data storage. A set of 10 DataCap Applications has been submitted by Seal Storage on behalf of our Customer, who wishes to remain confidential for the duration of this project project. Our Customer is a world-class scientific research organization and the data sets are outputs of scientific experiments.

The Customer has been working with large data sets (PiBs) for decades and is interested in pursuing the Filecoin Network as a solution to some of their exabyte-scale storage problems. This is why starting with a 10 PiB project makes sense to them as it represents a small portion of their complete archive. Due to the perceived risk associated with cryptocurrency projects, our Customer feels it is best to delay public announcement of our collaboration until we have successfully completed the project.

Data Set

The data set contains outputs of scientific experiments. The data itself is not of use to anyone beside our Customer due to the post-processing required to create a useful result. However, our Customer would like the data to remain private as viewing it can lead to understanding the name of the Customer. Therefore, the data will be encrypted as per Customer requirements.

It should be noted that scope for the project includes creating a publicly available data set.

Transparency in KYC

Our Customer is a world-class scientific organization and we have completed a KYC process to verify this customer including meeting the Customer Lead face-to-face, numerous meetings with the broader technical team, data transfer tests with the Customer Team and their collaborators, and verification of sample data.

We understand that a confidential customer and encrypted data complicate the ability of the Filecoin Community to verify this project.

Seal is committed to transparency and we have completed NDAs with notaries and storage providers, including Filecoin Foundation, and disclosed the name of the Customer along with evidence of the project such as email communications, a statement of work document, data transfer details and sample data.

Transparency in Filecoin Plus Guidelines

We would like to show our appreciation to the community for allowing this project project to move forward. In working closely with Protocol Labs and The Filecoin Foundation, Seal is following these recommendations as a path forward to make the project a success for the Customer and the Network.

1) Submit a Proposal Issue in the Notary Governance repo for your entire project 2) Submit ten 5-PiB datacap applications, each referencing the project 3) Each datacap application will be assigned to one SP 4) Four notaries have agreed to support these LDNs

Data Storage Plan

Five full replicas, total of 50 PiB of Datacap

Primary SP Partner: DLTX, receiving 10 PiB of Datacap for one full replica, Supporting Seal with compute to meet Customer milestones Location: Omaha, Nebraska

SP Cohort, each receiving 5 PiB, for a total of two fulls replicas Holon, location Sydney, Australia ElioVP, location Antwerp, Belgium W3b Cloud, Washington State, USA PikNik, San Diego, USA

Seal Storage: receives 20 PiB of Datacap for two full replicas Locations: Las Vegas, USA and Montreal, Canada For Customer project, Seal must also keep a full unsealed replica (10 PiB)

Notaries that Support the Project [person / org / region / Github app]

1) Danny O'Brien / Filecoin Foundation / EU / https://github.com/filecoin-project/notary-governance/issues/187 2) Kobby Chen / Fenbushi / China / https://github.com/filecoin-project/notary-governance/issues/138 3) Neo Ge / IPFS Main / China / https://github.com/filecoin-project/notary-governance/issues/168 4) Meg Dennis / Holon / Oceania / https://github.com/filecoin-project/notary-governance/issues/130 5) Eric / ByteBase / China / https://github.com/filecoin-project/notary-governance/issues/169

DataCap Applications

LDN-01-DLTX https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/274

LDN-02-DLTX https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/313

LDN-03-Holon https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/314

LDN-04-W3b
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/315

LDN-05-PikNik https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/316

LDN-06-ElioVP https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/317

LDN-07-Seal https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/318

LDN-08-Seal https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/319

LDN-09-Seal https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/320

LDN-10-Seal https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/321

cryptowhizzard commented 2 years ago

Thank you for this proposal.

The rules of Fil+ are fair and clear at the moment. -> https://github.com/filecoin-project/filecoin-plus-large-datasets/

A snap from the rules :

"the dataset should be public, open, and mission aligned with Filecoin and Filecoin Plus. This also means that the data should be accessible to anyone in the network, without requiring any special permissions or access requirement stored data should be readily retrievable on the network and this should be regularly verified (though the use of manual or automated verification that includes retrieving data from various miners over the course of the DataCap allocation timeframe)"

Given this i think it would make more sense if you don't submit a LDN application but submit for a regular datacap combined with a grant proposal towards PL. Storing encrypted data on an LDN is not a path forward in my opinion.

dannyob commented 2 years ago

I'd note that we're in a strange place with this, because there's no explicit prohibition on encryption in the rules. In this case, we can seek ways to ascertain that the dataset (separate from the encryption) is public, open and mission-aligned. The data itself would be accessible.

I feel like I'm lawyering a bit over data and dataset here, however, but I think that's because the rules weren't devised with this kind of scenario in mind, and given that our scoping rules are prefaced with the proviso that this is "still an evolving conversation, so the scope is subject to change", I think we can entertain whether this proposal fits the spirit of what we're trying to achieve, versus the methods we use to achieve them. I think it does.

Destore2023 commented 2 years ago

If we look at Google AWS, Microsoft’s storage terms, and conditions, all of their solutions for individuals in enterprises.

In addition to public data, a considerable part of them is non-public data belonging to individuals and enterprises.

I hope that this case can break through the limitation of the Filecoin network to store public data of enterprises, opening a new path for the Filecoin network.

Tom-OriginStorage commented 2 years ago

From what I understand, there isn't a need for companies/enterprises/organizations to apply for DataCap to store their data. Nobody is holding a gun at their head and saying that they must apply for a DataCap. They can simple use Filecoin as it is and still store their data without a DataCap.

MegTei commented 2 years ago

As discussed in the Notary governance call today, there is a need for an enterprise/ private program, of which this proposal has some elements of. 80% of the worlds data is not public, we are seeing demand from businesses with encrypted/ non-public data that would like to onboard and where they are KYC'd and trusted, the question is, can due diligence to establish trust be performed on the client over the data (which it is in this case)?
Im working with PL to design an enterprise/ private program to present at Fil+ day in June. We'll explore this amongst other rules, requirements and dimensions to ensure there is a viable and fair proposition which could be used to accelerate commercialisation of the network. Standby.

flyworker commented 2 years ago

For me, I still don't understand why we need a 10PB onboarding POC, plus, some storage provider does not even have 1PB deals on boarding experience, should not we let the SPs who have better experience do the test first?

I am still not convinced as I mentioned before in the governance call and SXSW for this big encrypted data case.

I would also want to know what is the timeline and process if some notaries want to push it pass anyway.

MegTei commented 2 years ago

Yep they are reasonable questions Charles for @scharfstein

xinaxu commented 2 years ago

Really looking forward to your proposal @MegTei It will be great to see some audit framework or trust framework to be established to bring the enterprise/private program to FIL+. It is critical to FIL success. We will need to solve the problem to make sure the data brought to Filecoin network is verified, and is verifiable to ensure the integrity and authenticity. This is difficult especially for private/encrypted data.

galen-mcandrew commented 2 years ago

Trying to make sure the 5 notaries are all signed off on these, since we'll be going out of normal tooling process to generate these LDN's with only those signers.

Can I get a thumbs-up emoji from the five notaries confirming you are ready to support these and that the following address is accurate for putting on these 10 LDN's?

  1. Danny O'Brien / Filecoin Foundation / @dannyob / f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a
  2. Kobby Chen / Fenbushi / @Fenbushi-filecoin / f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq
  3. Neo Ge / IPFS Main / @neogeweb3 / f13k5zr6ovc2gjmg3lvd43ladbydhovpylcvbflpa
  4. Meg Dennis / Holon / @MegTei / f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i
  5. Eric / ByteBase / @swatchliu / f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi
dannyob commented 2 years ago

I'd note though that I'd like to go forward with a bit of KYC with Seal, @scharfstein and the customer just to confirm the identity and plan to my satisfaction before I sign off.

dannyob commented 2 years ago

Just as an update to this -- I've spoken with @scharfstein and obtained more details about the customer, as well as the (confidential) scope of work, which matches this public request. I'm waiting on the answer to a couple of questions for the customer, which I anticipate receiving at the beginning of next week.

MegTei commented 2 years ago

Confirming under NDA I have discussed the project and client with Alex and Gregory @ Seal Storage, citing the statement of work between the client and Seal and the comms trail around the deal and have verified to the best of my ability. I have requested a checkpoint once the client is happy with the proof of concept later in 2022.

dannyob commented 2 years ago

Hey folks, so my questions were answered, and I'm ready to go ahead on this.

dannyob commented 2 years ago

Hey @dkkapur , @galen-mcandrew is there an explanation anywhere as to why we're doing this in 10 separate dataset applications? I'd like to understand this better.

psh0691 commented 2 years ago

Is it necessary to store non-public data using File+ DataCap? I'd like to recommend doing a direct deal with SP.

Because even after applying for LDN with the goal of storing public data, we found cases of filling it with meaningless dummy data for 10 times compensation.

If non-public data is stored in LDN, there will be more cases of storing meaningless dummy data by collusion between SP and customers.

I think you deserve to pay that much to store non-public data. That way, you won't fill it with dummy data.

galen-mcandrew commented 2 years ago

@scharfstein wondering if the team could do a check-in report at the next governance call?

The DataCap allocations went out to clients around the end of April, approximately 10 weeks right? As more and more enterprise and encrypted datasets are getting proposed, it seems important to keep reporting on the status and success of these proof of concept projects.

scharfstein commented 2 years ago

@scharfstein wondering if the team could do a check-in report at the next governance call?

The DataCap allocations went out to clients around the end of April, approximately 10 weeks right? As more and more enterprise and encrypted datasets are getting proposed, it seems important to keep reporting on the status and success of these proof of concept projects.

We would be happy to report out. I will attend and give an update on 12Jul at the 4pm pacific call.

Destore2023 commented 2 years ago

Hi Gregory, @scharfstein

How is your project going? It's been a while since I've seen Sal ask for multisig

As the supporting notary for Project Antarctic, I did not verify the data samples after the NDA was signed. Now I am asking for a review of the data sample. I don't know when is convenient for you, but you can contact me on Slack to set up an appointment.

Thanks!

scharfstein commented 2 years ago

Hi Gregory, @scharfstein

How is your project going? It's been a while since I've seen Sal ask for multisig

As the supporting notary for Project Antarctic, I did not verify the data samples after the NDA was signed. Now I am asking for a review of the data sample. I don't know when is convenient for you, but you can contact me on Slack to set up an appointment.

Thanks!

Eric - thanks for checking in.

We presented an update on the July 12 Governance call ... you can find the recording here: https://youtu.be/yqPc-0Wd75M?t=5023

scharfstein commented 2 years ago

In discussions with @dkkapur, it was recommended that we increase the weekly allocation.

100 TiB was the original request, we have increased this to 1 PiB.

This was done to all ten LDN applications.

claydrone commented 2 years ago

Hi Gregory, @scharfstein How is your project going? It's been a while since I've seen Sal ask for multisig As the supporting notary for Project Antarctic, I did not verify the data samples after the NDA was signed. Now I am asking for a review of the data sample. I don't know when is convenient for you, but you can contact me on Slack to set up an appointment. Thanks!

Eric - thanks for checking in.

We presented an update on the July 12 Governance call ... you can find the recording here: https://youtu.be/yqPc-0Wd75M?t=5023

curious @swatchliu Did you get the data?

Destore2023 commented 2 years ago

Hi Gregory, @scharfstein How is your project going? It's been a while since I've seen Sal ask for multisig As the supporting notary for Project Antarctic, I did not verify the data samples after the NDA was signed. Now I am asking for a review of the data sample. I don't know when is convenient for you, but you can contact me on Slack to set up an appointment. Thanks!

Eric - thanks for checking in. We presented an update on the July 12 Governance call ... you can find the recording here: https://youtu.be/yqPc-0Wd75M?t=5023

curious @swatchliu Did you get the data?

Frankly, I haven't seen the data, just the correspondence from the client and the reason why Seal has this client