(Optional) Filecoin ecosystem affiliations: David Aronchick, Bacalhau
Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes
Project Summary
Computation over Data (CoD) is gaining momentum as a paradigm to reliably transform the internet's datasets. The general data flow is to ingest data from a URL, save it to immutable storage like Filecoin under a CID, then apply a data transformation function, and finally store the result under a new CID. The key innovation of CoD frameworks like Bacalau or Kamu is to transform data via a provable, or verifiable compute, so that users can trust the result as having come from a particular data set and having gone through a particular transformation.
The challenge with the current CoD systems is that the data ingestion process cannot be trusted. Bahalhau and Kamu provide documentation (https://docs.bacalhau.org/data-ingestion/from-url/, https://docs.kamu.dev/odf/spec/#data-ingestion) on how to ingest data from URLs. These methods download the data and store it under a CID, which creates a fingerprint of the data content, but not of its source. The lack of source information creates several problems:
The content of the stored data sets cannot be trusted.
While it is possible that the data under a CID comes from the specified URL, an attacker could upload arbitrary content and claim that it comes from a URL. Kamu documentation states When interacting with external data we cannot make any assumptions about the guarantees that the external source provides (https://docs.kamu.dev/odf/spec/#data-ingestion).
The content at a URL may change over time.
While users may fetch URL content at the start of every computation, to prove to themselves that the content comes from the correct source, that content can change over time and make it difficult for users to agree on the correct result of the computation. Kamu documentation states we cannot rely on the external data being immutable, on being highly-available, we can’t even be sure the domain that hosts it will still be there the next day (https://docs.kamu.dev/odf/spec/#data-ingestion).
We propose to address these data injection challenges by implementing a general data oracle. The data oracle will run in a Nitro Enclave Trusted Execution Environment (TEE). At its core, a Nitro Enclave executes a containerized application in an isolated environment and provides a mechanism for the application to generate cryptographic attestations over its state, signed by Amazon. The data oracle will expose an API for users to specify a URL from which they want to fetch the data. The oracle will then download the URL content and create a source_attestation over the content, the certificate of the server hosting the URL, and a timestamp. The attestation will be signed by Amazon's private key, and so could be verified by anyone, including the transformation functions of the CoD frameworks like Bacalhau or Kamu.
The source_attestation would allow CoD framworks and their users to trust where the data came from and when it was obtained. As a result the output of these CoD tools could also be trusted as being based on accepted data sources.
Impact
source_attestations will preserve the veracity and verifiability of data to allow CoD frameworks to trust where input data comes from and when it was obtained
source_attestations in combination with CoD frameworks will produce deterministic results over URL data, which can serve as input to other transformations, including on-chain smart contracts
Outcomes
An enclave application that supports fetching data from an HTTPS endpoint and creates a Nitro enclave attestations over the source and contents of the data
Scripts to build and deploy a Nitro-enabled EC2 instance running the enclave application allowing anyone to deploy their own API data bridge to Filecoin
A CLI client to request and validate Nitro enclave attestations over the source and contents of the data.
Adoption, Reach, and Growth Strategies
We propose a mechanism to create source_attestations over the source and content of URL data. The proposed mechanism would create a general data oracle to provable ingest data in frameworks such as Bacalhau or Kamu. The source_attestations could also be used by other CoD frameworks in the CoD Working Group. source_attestations can also make it easier to write zero-knowledge proofs in projects such as Lurk, which can verify the source content-addressable data, without the need to interactively ingest from URLs.
Development Roadmap
We propose to extend the existing Nitro Enclave tooling to create an application to read and attest data from arbitrary internet sources. As a starting point, we will use BLOCKY's fork of the Bravenitriding framework, which allows users to build and verify Nitro Enclave containers. Nitriding already permits TLS connections to the enclave container, which we will extend to allow enclave applications to reach out to URLs and attest the source as well as the content of the data.
To allow Filecoin clients to attest the source of the data they store on Filecoin, we propose to implement the following protocol:
sequenceDiagram
autoNumber
participant client as Data Client
participant enclave as Enclave
participant url as Web Server
participant ipfs as IPFS
participant network as Filecoin
opt Get enclave attestation
client ->> enclave: GET /attestEnclave
activate client
activate enclave
enclave --) client: enclave_attestation
deactivate client
deactivate enclave
client ->> ipfs: store enclave_attestation
end
client ->> enclave: GET /attestSource?URL
activate client
activate enclave
enclave ->> url: GET URL
url --) enclave: OK (content)
note right of enclave: Create source_attestation
enclave --) client: OK (source_attestation)
deactivate enclave
client ->> network: store source_attestation
activate network
deactivate network
deactivate client
A new nitriding-based enclave application creates a unique key pair K_pub/K_pri on startup. The key K_pri is only available inside the enclave, and so any information signed by K_pri is known to come from the enclave. For a client to know that K_pub is an enclave key generated by a specific enclave application, the client requests a Nitro Enclave enclave_attestation from the /attestEnclave endpoint. The client in this case is a user of a CoD framework.
The enclave generates an enclave_attestation containing:
The client stores the enclave_attestation on IPFS, so that it becomes available to other users.
Note that steps 1 through 3 need to be performed only once. With the enclave_attestation available under its CID, the client may start requesting attestation over URL data.
A Filecoin data client requests a URL attestation from the enclave by calling the /attestSource endpoint with a URL as a parameter.
The enclave makes a GET request to the Web server hosting the URL.
The enclave downloads the content of the URL and creates a source_attestation containing:
the URL
the content of the URL[^1]
the TLS certificate of the Web Server
a timestamp
and the CID of the enclave_attestation (computed on the enclave from the
previously issued enclave_attestation)
all signed with the enclave application's private key K_pri.
[^1]: The HTTP response to a GET request could contain content requiring no further action. Alternatively, the response could require further processing of redirects, links, or scripts. In the first version of data oracle we propose to simply attest the content of HTTP response as-is, if the response status code is 200.
The enclave replies to the client with the source_attestation.
The client stores the source_attestation on Filecoin.
At the end of this process anyone reading the source_attestation from Filecoin will be able to:
find the corresponding enclave_attestation from its CID embedded in the source_attestation
verify the enclave_attestation and its K_pub
use K_pub to verify the source_attestation
know that the content in source_attestation came from the URL hosted at a web server with a specific certificate at a specific timestamp
In the context of CoD framework, the CIDs for the enclave_attestation and for the source_attestation can serve as input to computation, as in Bacalhau (https://docs.bacalhau.org/data-ingestion/from-url/#use-the-cid-in-a-new-bacalhau-job). The CoD framework can perform the verification process as part of the job to create output that is verifiable based on the content of a URL.
Milestones
An enclave application written in Go that supports fetching data from an HTTPS endpoint and creates a source_attestation over the data
Scripts to build and deploy a Nitro-enabled EC2 instance running data oracle
A CLI client to request and validate source_attestations
$15K on completing the Milestone 2 and Milestone 3
Maintenance and Upgrade Plans
We plan to release the project as open source to allow developers to update the data oracle enclave application and CLI to meet their needs. On our end we will maintain the nitriding framework for enclave application development and the Nitro-enabled EC2 deployment scripts.
Mike Wittie, David Millman, and Taylor Hardin hold PhDs in Computer Science with joint expertise in distributed systems, software engineering, and secure computing.
Particularly relevant to this proposal is our collaboration with the Brave team on nitriding - a framework for deployment and verification of containerized applications running on Nitro Enclaves. We have dog fooded nitriding to develop several internal applications.
HI @mwittie, thank you for your patience with our review. Unfortunately, we will not be proceeding with a grant at this time. Wishing you all the best as you continue to build!
Open Grant Proposal: Filecoin General Data Oracle
Project Name: Filecoin General Data Oracle
Proposal Category:
Developer and data tooling
Individual or Entity Name: BLOCKY, Inc.
Proposer:
mwittie
(Optional) Filecoin ecosystem affiliations: David Aronchick, Bacalhau
Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes
Project Summary
Computation over Data (CoD) is gaining momentum as a paradigm to reliably transform the internet's datasets. The general data flow is to ingest data from a URL, save it to immutable storage like Filecoin under a CID, then apply a data transformation function, and finally store the result under a new CID. The key innovation of CoD frameworks like Bacalau or Kamu is to transform data via a provable, or verifiable compute, so that users can trust the result as having come from a particular data set and having gone through a particular transformation.
The challenge with the current CoD systems is that the data ingestion process cannot be trusted. Bahalhau and Kamu provide documentation (https://docs.bacalhau.org/data-ingestion/from-url/, https://docs.kamu.dev/odf/spec/#data-ingestion) on how to ingest data from URLs. These methods download the data and store it under a CID, which creates a fingerprint of the data content, but not of its source. The lack of source information creates several problems:
The content of the stored data sets cannot be trusted.
While it is possible that the data under a CID comes from the specified URL, an attacker could upload arbitrary content and claim that it comes from a URL. Kamu documentation states When interacting with external data we cannot make any assumptions about the guarantees that the external source provides (https://docs.kamu.dev/odf/spec/#data-ingestion).
The content at a URL may change over time.
While users may fetch URL content at the start of every computation, to prove to themselves that the content comes from the correct source, that content can change over time and make it difficult for users to agree on the correct result of the computation. Kamu documentation states we cannot rely on the external data being immutable, on being highly-available, we can’t even be sure the domain that hosts it will still be there the next day (https://docs.kamu.dev/odf/spec/#data-ingestion).
We propose to address these data injection challenges by implementing a general data oracle. The data oracle will run in a Nitro Enclave Trusted Execution Environment (TEE). At its core, a Nitro Enclave executes a containerized application in an isolated environment and provides a mechanism for the application to generate cryptographic attestations over its state, signed by Amazon. The data oracle will expose an API for users to specify a URL from which they want to fetch the data. The oracle will then download the URL content and create a
source_attestation
over the content, the certificate of the server hosting the URL, and a timestamp. The attestation will be signed by Amazon's private key, and so could be verified by anyone, including the transformation functions of the CoD frameworks like Bacalhau or Kamu.The
source_attestation
would allow CoD framworks and their users to trust where the data came from and when it was obtained. As a result the output of these CoD tools could also be trusted as being based on accepted data sources.Impact
source_attestations
will preserve the veracity and verifiability of data to allow CoD frameworks to trust where input data comes from and when it was obtainedsource_attestations
in combination with CoD frameworks will produce deterministic results over URL data, which can serve as input to other transformations, including on-chain smart contractsOutcomes
Adoption, Reach, and Growth Strategies
We propose a mechanism to create
source_attestations
over the source and content of URL data. The proposed mechanism would create a general data oracle to provable ingest data in frameworks such as Bacalhau or Kamu. Thesource_attestations
could also be used by other CoD frameworks in the CoD Working Group.source_attestations
can also make it easier to write zero-knowledge proofs in projects such as Lurk, which can verify the source content-addressable data, without the need to interactively ingest from URLs.Development Roadmap
We propose to extend the existing Nitro Enclave tooling to create an application to read and attest data from arbitrary internet sources. As a starting point, we will use BLOCKY's fork of the Brave nitriding framework, which allows users to build and verify Nitro Enclave containers. Nitriding already permits TLS connections to the enclave container, which we will extend to allow enclave applications to reach out to URLs and attest the source as well as the content of the data.
To allow Filecoin clients to attest the source of the data they store on Filecoin, we propose to implement the following protocol:
A new nitriding-based enclave application creates a unique key pair
K_pub/K_pri
on startup. The keyK_pri
is only available inside the enclave, and so any information signed byK_pri
is known to come from the enclave. For a client to know thatK_pub
is an enclave key generated by a specific enclave application, the client requests a Nitro Enclaveenclave_attestation
from the/attestEnclave
endpoint. The client in this case is a user of a CoD framework.The enclave generates an
enclave_attestation
containing:K_pub
PCR0
, or the measurement of application container running on the enclaveboth signed by the well-known Nitro Enclave private key. Any client can thereafter verify the attestation using the well-known Nitro Enclave public key, check that the
PCR0
matches an expected value, and useK_pub
to authenticate messages from the enclave.The client stores the
enclave_attestation
on IPFS, so that it becomes available to other users.Note that steps 1 through 3 need to be performed only once. With the
enclave_attestation
available under itsCID
, the client may start requesting attestation over URL data.A Filecoin data client requests a
URL
attestation from the enclave by calling the/attestSource
endpoint with aURL
as a parameter.The enclave makes a
GET
request to the Web server hosting theURL
.The enclave downloads the
content
of theURL
and creates asource_attestation
containing:URL
content
of theURL
[^1]certificate
of the Web Servertimestamp
CID
of theenclave_attestation
(computed on the enclave from the previously issuedenclave_attestation
)all signed with the enclave application's private key
K_pri
.[^1]: The
HTTP
response to aGET
request could contain content requiring no further action. Alternatively, the response could require further processing of redirects, links, or scripts. In the first version of data oracle we propose to simply attest the content ofHTTP
response as-is, if the response status code is200
.The enclave replies to the client with the
source_attestation
.The client stores the
source_attestation
on Filecoin.At the end of this process anyone reading the
source_attestation
from Filecoin will be able to:enclave_attestation
from itsCID
embedded in thesource_attestation
enclave_attestation
and itsK_pub
K_pub
to verify thesource_attestation
content
insource_attestation
came from theURL
hosted at a web server with a specificcertificate
at a specifictimestamp
In the context of CoD framework, the
CIDs
for theenclave_attestation
and for thesource_attestation
can serve as input to computation, as in Bacalhau (https://docs.bacalhau.org/data-ingestion/from-url/#use-the-cid-in-a-new-bacalhau-job). The CoD framework can perform the verification process as part of the job to create output that is verifiable based on the content of a URL.Milestones
source_attestation
over the datasource_attestations
Total Budget Requested
We request $60K to complete these milestones:
Maintenance and Upgrade Plans
We plan to release the project as open source to allow developers to update the data oracle enclave application and CLI to meet their needs. On our end we will maintain the
nitriding
framework for enclave application development and the Nitro-enabled EC2 deployment scripts.Team
Team Members
Team Member LinkedIn Profiles
Team Website
https://blocky.rocks
Relevant Experience
Mike Wittie, David Millman, and Taylor Hardin hold PhDs in Computer Science with joint expertise in distributed systems, software engineering, and secure computing.
Particularly relevant to this proposal is our collaboration with the Brave team on nitriding - a framework for deployment and verification of containerized applications running on Nitro Enclaves. We have dog fooded
nitriding
to develop several internal applications.Team code repositories
https://github.com/blocky/nitriding
Additional Information
To discuss grant agreement and general next steps please contact Taylor Heinecke.