filecoin-project / boost

Boost is a tool for Filecoin storage providers to manage data storage and retrievals on Filecoin.
Other
111 stars 72 forks source link

Retrieval Attestation #1597

Closed bajtos closed 1 year ago

bajtos commented 1 year ago

Checklist

Boost component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

In Filecoin Station, we are building SPARK - a module that periodically checks retrievability of content from Filecoin Storage Providers. At the moment, we are adding FIL rewards for performing these checks. In order to combat fraud, we would like Boost to provide retrieval attestations that will allow 3rd parties to verify that a client performed a retrieval request from a particular provider.

You can learn more in SPARK Content retrieval attestation and Meridian Design Doc 03: Evaluation dissected·

In short:·

Describe the solution you'd like

(1) The retrieval client performing a retrieval request includes a new field in the request metadata - retrieval_id containing a string value. We recommend clients send a SHA-384 hash of the actual identifier.

(2) The retrieval server returns an attestation signed with the server’s private key - the same key as used for libp2p peer identity. The attestation payload includes the following metadata:·

This is a high-level proposal that intentionally excludes details. I’d like us to first agree whether this feature is feasible at the high level, before we dive deeper into details.

Describe alternatives you've considered

No response

Additional context

What kind of feedback I am looking for

How this helps SPARK

How this can help other retrieval clients

I feel the proposal is generic enough to support different usages. Since the retrieval id is a hash of arbitrary data, it’s possible to pack literally anything into the retrieval id, get the SP to sign that, and later verify that the SP signed the expected field values.

Ideally, we would like to have “Proof of Retrieval”. Unfortunately, such proof is still an open problem. We think that Retrieval Attestation can get us somewhat closer to that ideal.

For example, I can imagine a browser service worker retrieving content from a dCDN like Saturn can use the retrieval attestation to attribute credit to the specific SP that provided the content to serve the request, allowing content providers to reward SPs based on how many retrievals they helped to serve.

The proposed format based on JWT can be extended to support signature chains, e.g. the outer attestation token created by an untrusted gateway can wrap an inner attestation token produced by the SP from which the gateway retrieved the content.

Technical details: retrieval id

The implementation should support arbitrary formats of retrieval ids. However, we recommend all clients use a SHA-384 hash of the original retrieval identifier.

Technical details: attestation string

I propose using JWT for the attestation string. JWT is a widely used format with good support in many programming languages. It’s used by other projects in the Web3 space, too - most notably UCAN.

In its compact form, JSON Web Tokens consist of three parts separated by dots (.), which are:

Therefore, a JWT typically looks like this: Header.Payload.Signature

JWT Header

{
  "alg": "EdDSA",
  "typ": "JWT",
  "rav": "0.1.0"
}

This is a standard JWT header, plus the extra rav field.

JWT Payload

{
  "iss": "12D3KooWRH71QRJe5vrMp6zZXoH4K7z5MDSWwTXXPriG9dK8HQXk",
  "retrv_rid": "38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b",
  "retrv_cid": "bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni",
  "retrv_proto": "graphsync"
}

We expect more fields will be added in the future. For example, when a retrieval request specifies an IPLD selector, the attestation payload can include retrv_selector field describing what subset of the Merkle tree was requested.

For the initial version, we want to introduce only the fields needed by SPARK.

JWT Signature

Quoting from JWT Introduction

To create the signature part you have to take the encoded header, the encoded payload, a secret, the algorithm specified in the header, and sign that.

For example if you want to use the HMAC SHA256 algorithm, the signature will be created in the following way:

HMACSHA256(
  base64UrlEncode(header) + "." +
  base64UrlEncode(payload),
  secret)

The signature is used to verify the message wasn't changed along the way, and, in the case of tokens signed with a private key, it can also verify that the sender of the JWT is who it says it is.

Of course, we will use a different algorithm than HMAC SHA256. Maybe Ed25519? The algorithm will most likely depend on the algorithm used by the libp2p identity key-pair.

Tagging @juliangruber, @patrickwoodhead and @willscott for visibility.

willscott commented 1 year ago
dirkmc commented 1 year ago

The proposal sounds good to me 👍 As Will points out, probably the easiest place to add retrieval attestations is in the HTTP protocol. Another advantage of HTTP is that it is layered. You can build an http server that provides retrieval attestation, that sits in front of booster-http. That way your team won't get blocked by the Boost team's availability.

juliangruber commented 1 year ago

I wonder how far HTTP level attestation is going to get us. I agree that from a technical perspective this is the way to go. However, the main purpose of SPARK is to collect data on retrievability, and I have two concerns:

TLDR: If we start with HTTP I think we will have a good iteration platform. A timeline feel for attestation of other protocols will be useful.

willscott commented 1 year ago

@juliangruber Several of the other efforts have decided to rally around HTTP so that does seem like the place to focus at the moment - see also https://www.notion.so/Project-HTTP-UP-7a3daf6633214ae6b31c5a67b2ac17f0 if you haven't yet.

juliangruber commented 1 year ago

This is great! Do you think non-plus SPs will follow along? Or do you think it's fine to target the level of FIL+?

willscott commented 1 year ago

I think the bulk of SPs offering any form of retrieval will prefer HTTP as the protocol, as it's easiest to manage / control from their end.

bajtos commented 1 year ago

Hi folks, thank you for the constructive feedback and discussion. We had many discussions about this proposal in the last few days and need to change the course slightly.

I'll post more updates as we get more clarity about what SPARK needs and what is feasible to implement.

One new feature we have already identified:

bajtos commented 1 year ago

Update: after more discussions, we have settled on an extra content-type parameter allowing clients to request an additional metadata block to be appended after the CAR stream response. I opened an IPIP to discuss the details: https://github.com/ipfs/specs/pull/431

bajtos commented 1 year ago

Let's continue the discussion in https://github.com/filecoin-project/boost/issues/1610

I am closing this issue as superseded.