Closed bajtos closed 1 year ago
The proposal sounds good to me 👍 As Will points out, probably the easiest place to add retrieval attestations is in the HTTP protocol. Another advantage of HTTP is that it is layered. You can build an http server that provides retrieval attestation, that sits in front of booster-http. That way your team won't get blocked by the Boost team's availability.
I wonder how far HTTP level attestation is going to get us. I agree that from a technical perspective this is the way to go. However, the main purpose of SPARK is to collect data on retrievability, and I have two concerns:
TLDR: If we start with HTTP I think we will have a good iteration platform. A timeline feel for attestation of other protocols will be useful.
@juliangruber Several of the other efforts have decided to rally around HTTP so that does seem like the place to focus at the moment - see also https://www.notion.so/Project-HTTP-UP-7a3daf6633214ae6b31c5a67b2ac17f0 if you haven't yet.
This is great! Do you think non-plus SPs will follow along? Or do you think it's fine to target the level of FIL+?
I think the bulk of SPs offering any form of retrieval will prefer HTTP as the protocol, as it's easiest to manage / control from their end.
Hi folks, thank you for the constructive feedback and discussion. We had many discussions about this proposal in the last few days and need to change the course slightly.
Our plan is to support HTTP retrievals only, as that seems to be the direction for the future of Filecoin retrievals. That does not mean these attestations cannot be implemented for Graphsync and Bitswap, just that it's not something SPARK is interested in.
The JWT-based attestation tokens would consume too much bandwidth. With the sample payload I shown above, the attestation token has ~500 bytes. I'll explore different options with a more efficient representation.
Creating a new signature for each retrieval request adds a non-negligible CPU cost. We need to measure the impact of these signatures on booster-http performance and document the implications so that SPs know what to expect.
I'll post more updates as we get more clarity about what SPARK needs and what is feasible to implement.
One new feature we have already identified:
Update: after more discussions, we have settled on an extra content-type parameter allowing clients to request an additional metadata block to be appended after the CAR stream response. I opened an IPIP to discuss the details: https://github.com/ipfs/specs/pull/431
Let's continue the discussion in https://github.com/filecoin-project/boost/issues/1610
I am closing this issue as superseded.
Checklist
Ideas
.Boost component
What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.
In Filecoin Station, we are building SPARK - a module that periodically checks retrievability of content from Filecoin Storage Providers. At the moment, we are adding FIL rewards for performing these checks. In order to combat fraud, we would like Boost to provide retrieval attestations that will allow 3rd parties to verify that a client performed a retrieval request from a particular provider.
You can learn more in SPARK Content retrieval attestation and Meridian Design Doc 03: Evaluation dissected·
In short:·
Describe the solution you'd like
(1) The retrieval client performing a retrieval request includes a new field in the request metadata -
retrieval_id
containing a string value. We recommend clients send a SHA-384 hash of the actual identifier.(2) The retrieval server returns an attestation signed with the server’s private key - the same key as used for libp2p peer identity. The attestation payload includes the following metadata:·
retrieval_id
supplied by the client,cid
being requested, andprotocol
used (bitswap, graphsync, http).This is a high-level proposal that intentionally excludes details. I’d like us to first agree whether this feature is feasible at the high level, before we dive deeper into details.
Describe alternatives you've considered
No response
Additional context
What kind of feedback I am looking for
How this helps SPARK
retrieval_id
from this job id, perform the retrieval check and finally send the attestation returned by SP alongside other retrieval statistics.How this can help other retrieval clients
I feel the proposal is generic enough to support different usages. Since the retrieval id is a hash of arbitrary data, it’s possible to pack literally anything into the retrieval id, get the SP to sign that, and later verify that the SP signed the expected field values.
Ideally, we would like to have “Proof of Retrieval”. Unfortunately, such proof is still an open problem. We think that Retrieval Attestation can get us somewhat closer to that ideal.
For example, I can imagine a browser service worker retrieving content from a dCDN like Saturn can use the retrieval attestation to attribute credit to the specific SP that provided the content to serve the request, allowing content providers to reward SPs based on how many retrievals they helped to serve.
The proposed format based on JWT can be extended to support signature chains, e.g. the outer attestation token created by an untrusted gateway can wrap an inner attestation token produced by the SP from which the gateway retrieved the content.
Technical details: retrieval id
The implementation should support arbitrary formats of retrieval ids. However, we recommend all clients use a SHA-384 hash of the original retrieval identifier.
peer_id
using the DRAND seed from the epochN
, we can compose the retrieval id asN;peer_id
, e.g.539;12D3KooWRH71QRJe5vrMp6zZXoH4K7z5MDSWwTXXPriG9dK8HQXk
. Now if we send this string as the retrieval id, then the remote party can inspect the format of the string to guess what software is making the request. Additionally, the payload can be too large for the underlying protocol. Hashing the original id solves both issues.Technical details: attestation string
I propose using JWT for the attestation string. JWT is a widely used format with good support in many programming languages. It’s used by other projects in the Web3 space, too - most notably UCAN.
In its compact form, JSON Web Tokens consist of three parts separated by dots (
.
), which are:Therefore, a JWT typically looks like this:
Header.Payload.Signature
JWT Header
This is a standard JWT header, plus the extra
rav
field.JWT Payload
iss
- “Issuer” ID of who created the attestation - the public key from the libp2p identity of the peer serving the retrieval. This field is defined by the JWT standard.retrv_rid
: the retrieval id provided by the clientretrv_cid
: the CID retrievedretrv_proto:
the protocol used -graphsync
,bitswap
orhttp
We expect more fields will be added in the future. For example, when a retrieval request specifies an IPLD selector, the attestation payload can include
retrv_selector
field describing what subset of the Merkle tree was requested.For the initial version, we want to introduce only the fields needed by SPARK.
JWT Signature
Quoting from JWT Introduction
Of course, we will use a different algorithm than HMAC SHA256. Maybe Ed25519? The algorithm will most likely depend on the algorithm used by the libp2p identity key-pair.
Tagging @juliangruber, @patrickwoodhead and @willscott for visibility.