filecoin-project / devgrants

👟 Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
Other
376 stars 308 forks source link

Filecoin Storage Bridge #1548

Closed mwittie closed 1 year ago

mwittie commented 1 year ago

Open Grant Proposal: Filecoin General Data Oracle

Project Name: Filecoin General Data Oracle

Proposal Category: Developer and data tooling

Individual or Entity Name: BLOCKY, Inc.

Proposer: mwittie

(Optional) Filecoin ecosystem affiliations: David Aronchick, Bacalhau

Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes

Project Summary

Computation over Data (CoD) is gaining momentum as a paradigm to reliably transform the internet's datasets. The general data flow is to ingest data from a URL, save it to immutable storage like Filecoin under a CID, then apply a data transformation function, and finally store the result under a new CID. The key innovation of CoD frameworks like Bacalau or Kamu is to transform data via a provable, or verifiable compute, so that users can trust the result as having come from a particular data set and having gone through a particular transformation.

The challenge with the current CoD systems is that the data ingestion process cannot be trusted. Bahalhau and Kamu provide documentation (https://docs.bacalhau.org/data-ingestion/from-url/, https://docs.kamu.dev/odf/spec/#data-ingestion) on how to ingest data from URLs. These methods download the data and store it under a CID, which creates a fingerprint of the data content, but not of its source. The lack of source information creates several problems:

We propose to address these data injection challenges by implementing a general data oracle. The data oracle will run in a Nitro Enclave Trusted Execution Environment (TEE). At its core, a Nitro Enclave executes a containerized application in an isolated environment and provides a mechanism for the application to generate cryptographic attestations over its state, signed by Amazon. The data oracle will expose an API for users to specify a URL from which they want to fetch the data. The oracle will then download the URL content and create a source_attestation over the content, the certificate of the server hosting the URL, and a timestamp. The attestation will be signed by Amazon's private key, and so could be verified by anyone, including the transformation functions of the CoD frameworks like Bacalhau or Kamu.

The source_attestation would allow CoD framworks and their users to trust where the data came from and when it was obtained. As a result the output of these CoD tools could also be trusted as being based on accepted data sources.

Impact

Outcomes

  1. An enclave application that supports fetching data from an HTTPS endpoint and creates a Nitro enclave attestations over the source and contents of the data
  2. Scripts to build and deploy a Nitro-enabled EC2 instance running the enclave application allowing anyone to deploy their own API data bridge to Filecoin
  3. A CLI client to request and validate Nitro enclave attestations over the source and contents of the data.

Adoption, Reach, and Growth Strategies

We propose a mechanism to create source_attestations over the source and content of URL data. The proposed mechanism would create a general data oracle to provable ingest data in frameworks such as Bacalhau or Kamu. The source_attestations could also be used by other CoD frameworks in the CoD Working Group. source_attestations can also make it easier to write zero-knowledge proofs in projects such as Lurk, which can verify the source content-addressable data, without the need to interactively ingest from URLs.

Development Roadmap

We propose to extend the existing Nitro Enclave tooling to create an application to read and attest data from arbitrary internet sources. As a starting point, we will use BLOCKY's fork of the Brave nitriding framework, which allows users to build and verify Nitro Enclave containers. Nitriding already permits TLS connections to the enclave container, which we will extend to allow enclave applications to reach out to URLs and attest the source as well as the content of the data.

To allow Filecoin clients to attest the source of the data they store on Filecoin, we propose to implement the following protocol:

sequenceDiagram
    autoNumber
    participant client as Data Client
    participant enclave as Enclave
    participant url as Web Server
    participant ipfs as IPFS
    participant network as Filecoin

    opt Get enclave attestation
        client ->> enclave: GET /attestEnclave
        activate client
        activate enclave
        enclave --) client: enclave_attestation
        deactivate client
        deactivate enclave

        client ->> ipfs: store enclave_attestation
    end

    client ->> enclave: GET /attestSource?URL
    activate client
    activate enclave
    enclave ->> url: GET URL
    url --) enclave: OK (content)
    note right of enclave: Create source_attestation
    enclave --) client: OK (source_attestation)
    deactivate enclave

    client ->> network: store source_attestation
    activate network
    deactivate network
    deactivate client
  1. A new nitriding-based enclave application creates a unique key pair K_pub/K_pri on startup. The key K_pri is only available inside the enclave, and so any information signed by K_pri is known to come from the enclave. For a client to know that K_pub is an enclave key generated by a specific enclave application, the client requests a Nitro Enclave enclave_attestation from the /attestEnclave endpoint. The client in this case is a user of a CoD framework.

  2. The enclave generates an enclave_attestation containing:

    both signed by the well-known Nitro Enclave private key. Any client can thereafter verify the attestation using the well-known Nitro Enclave public key, check that the PCR0 matches an expected value, and use K_pub to authenticate messages from the enclave.

  3. The client stores the enclave_attestation on IPFS, so that it becomes available to other users.

Note that steps 1 through 3 need to be performed only once. With the enclave_attestation available under its CID, the client may start requesting attestation over URL data.

  1. A Filecoin data client requests a URL attestation from the enclave by calling the /attestSource endpoint with a URL as a parameter.

  2. The enclave makes a GET request to the Web server hosting the URL.

  3. The enclave downloads the content of the URL and creates a source_attestation containing:

    • the URL
    • the content of the URL[^1]
    • the TLS certificate of the Web Server
    • a timestamp
    • and the CID of the enclave_attestation (computed on the enclave from the previously issued enclave_attestation)

    all signed with the enclave application's private key K_pri.

[^1]: The HTTP response to a GET request could contain content requiring no further action. Alternatively, the response could require further processing of redirects, links, or scripts. In the first version of data oracle we propose to simply attest the content of HTTP response as-is, if the response status code is 200.

  1. The enclave replies to the client with the source_attestation.

  2. The client stores the source_attestation on Filecoin.

At the end of this process anyone reading the source_attestation from Filecoin will be able to:

In the context of CoD framework, the CIDs for the enclave_attestation and for the source_attestation can serve as input to computation, as in Bacalhau (https://docs.bacalhau.org/data-ingestion/from-url/#use-the-cid-in-a-new-bacalhau-job). The CoD framework can perform the verification process as part of the job to create output that is verifiable based on the content of a URL.

Milestones

  1. An enclave application written in Go that supports fetching data from an HTTPS endpoint and creates a source_attestation over the data
  2. Scripts to build and deploy a Nitro-enabled EC2 instance running data oracle
  3. A CLI client to request and validate source_attestations

Total Budget Requested

gantt
    dateFormat  YYYY-MM-DD
    excludes    weekends
    Milestone 1 - Enclave application: m1, 2023-08-15, 30d
    Milestone 2 - Deployment scripts: m2, after m1, 20d
    Milestone 3 - CLI client: m3, after m1, 20d    

We request $60K to complete these milestones:

Maintenance and Upgrade Plans

We plan to release the project as open source to allow developers to update the data oracle enclave application and CLI to meet their needs. On our end we will maintain the nitriding framework for enclave application development and the Nitro-enabled EC2 deployment scripts.

Team

Team Members

Team Member LinkedIn Profiles

Team Website

https://blocky.rocks

Relevant Experience

Mike Wittie, David Millman, and Taylor Hardin hold PhDs in Computer Science with joint expertise in distributed systems, software engineering, and secure computing.

Particularly relevant to this proposal is our collaboration with the Brave team on nitriding - a framework for deployment and verification of containerized applications running on Nitro Enclaves. We have dog fooded nitriding to develop several internal applications.

Team code repositories

https://github.com/blocky/nitriding

Additional Information

To discuss grant agreement and general next steps please contact Taylor Heinecke.

ErinOCon commented 1 year ago

HI @mwittie, thank you for your patience with our review. Unfortunately, we will not be proceeding with a grant at this time. Wishing you all the best as you continue to build!