hannahhoward commented 2 years ago

Goals

Enable discoverability of HTTP retrieval support

The primary changes for v2 protocol are:

Query:

You can ask for just a PieceCID (only piece retrieval protocols will be returned)

Response:

Information will be given about available protocols, and parameters specific to those protocols

For now the newly design protocol makes all additional information protocol specific

How

probably make a new retrievalmarket folder to collect newly built retrieval pieces
make an implementation of the query protocol for libp2p, also drawing from retrieval query protocol v1

The schema for the QueryAsk v1 request / response (in IPLD schema -- it's encoded as DAG CBOR) is as follows:

# QueryParams - V1 - indicate what specific information about a piece that a retrieval
# client is interested in, as well as specific parameters the client is seeking
# for the retrieval deal
type QueryParams struct {
    PieceCID nullable Link  # optional, query if miner has this cid in this piece. some miners may not be able to respond.
}

# Query is a query to a given provider to determine information about a piece
# they may have available for retrieval
type Query struct {
    PayloadCID  Link 
    QueryParams QueryParams
}

# QueryResponseStatus indicates whether a queried piece is available
type QueryResponseStatus enum {
  # QueryResponseAvailable indicates a provider has a piece and is prepared to
    # return it
    | QueryResponseAvailable ("0")

    # QueryResponseUnavailable indicates a provider either does not have or cannot
    # serve the queried piece to the client
    | QueryResponseUnavailable ("1")

    # QueryResponseError indicates something went wrong generating a query response
    | QueryResponseError ("2")
} representation int

# QueryItemStatus (V1) indicates whether the requested part of a piece (payload or selector)
# is available for retrieval
type QueryItemStatus enum {
    # QueryItemAvailable indicates requested part of the piece is available to be
    # served
    | QueryItemAvailable ("0")

    # QueryItemUnavailable indicates the piece either does not contain the requested
    # item or it cannot be served
    | QueryItemUnavailable ("1")

    # QueryItemUnknown indicates the provider cannot determine if the given item
    # is part of the requested piece (for example, if the piece is sealed and the
    # miner does not maintain a payload CID index)
    | QueryItemUnknown ("2")
} representation int

type Address Bytes
type TokenAmount Bytes

# QueryResponse is a miners response to a given retrieval query
type QueryResponse struct {
    Status        QueryResponseStatus
    PieceCIDFound QueryItemStatus # V1 - if a PieceCID was requested, the result

    Size Int

    PaymentAddress             Address # address.Address to send funds to -- may be different than miner addr
    MinPricePerByte            TokenAmount
    MaxPaymentInterval         Int
    MaxPaymentIntervalIncrease Int
    Message                    String
    UnsealPrice                TokenAmount
}

The proposed v2 schema is:


# QueryKind specifies whether you are interested in a payload or a piece
type QueryKind enum {
    | Piece ("pc")
    | Payload ("pl")
} representation string

# Query is a query to a given provider to determine information about a piece
# they may have available for retrieval
type Query struct {
        Kind  QueryKind
        # If kind is Payload, response will error if not included
    PayloadCID optional Link
    # If kind is Piece, response will error if not included
    PieceCID optional Link 
}

# QueryResponseStatus indicates whether a queried piece is available
type QueryResponseStatus enum {
  # QueryResponseAvailable indicates a provider has a piece and is prepared to
    # return it
    | QueryResponseAvailable ("0")

    # QueryResponseUnavailable indicates a provider either does not have or cannot
    # serve the queried piece to the client
    | QueryResponseUnavailable ("1")

    # QueryResponseError indicates something went wrong generating a query response
    | QueryResponseError ("2")
} representation int

# QueryItemStatus (V1) indicates whether the requested part of a piece (payload or selector)
# is available for retrieval
type QueryItemStatus enum {
    # QueryItemAvailable indicates requested part of the piece is available to be
    # served
    | QueryItemAvailable ("0")

    # QueryItemUnavailable indicates the piece either does not contain the requested
    # item or it cannot be served
    | QueryItemUnavailable ("1")

    # QueryItemUnknown indicates the provider cannot determine if the given item
    # is part of the requested piece (for example, if the piece is sealed and the
    # miner does not maintain a payload CID index)
    | QueryItemUnknown ("2")
} representation int

type Address Bytes
type TokenAmount Bytes

type ProtocolName enum {
   | GraphsyncFIlecoinV1 ("graphsync/v1")
   | HTTPFilecoinV1 ("http/v1")
} representation string

# QueryResponse is a miners response to a given retrieval query
type QueryResponse struct {
    Status        QueryResponseStatus
    Protocols {ProtocolName:Any}
}

# Structure of Graphsync Protocol Data
type GraphsyncFilecoinV1Response struct {
    Size Int
    PaymentAddress             Address # address.Address to send funds to -- may be different than miner addr
    MinPricePerByte            TokenAmount
    MaxPaymentInterval         Int
    MaxPaymentIntervalIncrease Int
    Message                    String
    UnsealPrice                TokenAmount
}

# Structure of HTTP V1 Response
type HTTPFilecoinV1Response {
      URL String
      Size Int
}

jacobheun commented 2 years ago

Overall this looks reasonable. I don't think we need QueryKind though, we can infer this from the Params:

If only PieceCid is provided, provide a response for the piece retrieval with available options. (http only)
If PieceCid and PayloadCid is provided, provide a response for that payload within the given piece. (http & graphsync)
If only PayloadCid is provided, provide a response for that payload in any piece. (http & graphsync)

In your v2 proposal you got rid of the usage, but not declaration of QueryItemStatus, is that intentional? It does seem unnecessary to have, but I don't know what the V1 intent was for distinguishing QueryItemStatus versus QueryResponseStatus. They seem redundant.

ribasushi commented 2 years ago

If only PieceCid is provided, provide a response for the piece retrieval with available options. (http only)

Just want to flag not to hardcode this http only limitation as part of the protocol itself. A piece is itself a tree-hashed payload, there could very well be a near future where the piece blob itself it is retrievable over graphsync/bitswap.

cc @mikeal @rvagg

jacobheun commented 2 years ago

Just want to flag not to hardcode this http only limitation as part of the protocol itself.

Totally agree, the ( ) were meant to capture the current state of support and not future support. We should return all available options.

s0nik42 commented 2 years ago

I've got general concern about the query-ask in general and the retrieval/storage sucess rate. Its a bit wider than the scope of this Issue, but I think it worth discussing it here.

Storage Deals In V1, a client looking to store data is doing (lets take estuary) :

get-ask
Send a storage-proposal accordingly Only at that point the dealfilter is triggered. the deal can be refused based on many conditions (pipeline is full, price, size, blacklist addresses); the ability for a SP to manage per client pricing, deal acceptance conditions and deal flow is what we provide with CIDgravity.

This asymmetry reduces drastically the success rate for client and SPs reputation.

Retrieval Deals

Same concern as for Storage Deals
- When a retrieval proposal is priced at 0. The proposal is not signed by any Filecoin address. Which is fine for public data, but for private date, giving the ability for client to send any proposal, will allow authentification on retrieval which is super important especially for enterprise business (like shoah fundation).

Putting all of this together :

I think the current deal-filter or a get-ask filter should be triggered during the get-ask with more parameters (same parameters as the storage-proposal). Doing so the success rate of the deal proposal should highly increase giving more satisfaction to all participants (SP and client). This point has already been discussed with the arg team about a year ago @brendalee know more about it.
Client should be able to sign a retrievalProposal to authenticate the proposal for SP to apply the correct pricing/dealmaking conditions/access control. Today the only way to do that is based on peerID and it's really not reliable.

hannahhoward commented 2 years ago

@s0nik42

Thanks for these awesome suggestions.

The biggest software challenge with running the filters in the query phase currently is that the interface exposed to run the deal filter, at least the level of the markets software, takes an actual deal proposal. So we'd have to essentially synthesize one to run the filters or create a new DealFilterParams type struct to capture all the levers a provider might want to apply to decide whether to take a deal. And we'd also need to probably add some parameters to the ask protocol (on the storage side at least) to synthesize an accurate representation of what the deal is likely to look like (for example, is it an offline deal -- something not yet obvious in the current setup). When @dirkmc is back he can probably think through this more in depth.

I definitely agree on signing retrieval deals though it in the case of retrieval we can keep it optional for the purposes of public data

For the purpose of query ask v2, I think the forward thinking feature I can add, to avoid another breaking protocol change, is to support signatures when you send the query.

dirkmc commented 2 years ago

@s0nik42 would it be sufficient for the ask protocol to return a boolean indicating whether it's a public vs private endpoint?

I'm imagining something like:

Client sends query ask to several SPs
SPs respond with ask (including boolean indicating public / private)
Client sends data request to SP that is public and meets client's price conditions

nicobao commented 2 years ago

I think we should consider using the concept of W3C DID (https://www.w3.org/TR/did-core/) and UCAN (https://ucan.xyz/) for auth and access control for private data.

Let me explain the use case through a user story (example usage of DID/UCAN is approximate and remains to be defined).

When a Client sends a Storage deal, the Client signs the proposal with his DID (e.g: did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ) , that would resolve into a DID Document that looks like this:

{
"@context": "https://w3id.org/did/v1",
"id": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ",
"publicKey": [{
"id": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey",
"type": "Ed25519VerificationKey2018",
"controller": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ",
"publicKeyBase58": "B12NYF8RrR3h41TDCTJojY59usg3mbtbjnFs7Eud1Y6u"
}],
"authentication": [ 
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"assertionMethod": [ 
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"capabilityDelegation": [
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"capabilityInvocation": [
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"keyAgreement": [{
"id": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#kakey",
"type": "X25519KeyAgreementKey2019",
"controller": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ",
"publicKeyBase58": "JhNWeSVLMYccCk7iopQW4guaSJTojqpMEELgSLhKwRr"
}]
}

through a Filecoin DID Method (similar to ION SideTree, here is the spec: https://github.com/decentralized-identity/sidetree)

After storage is done, Clients would add their UCAN (see https://ucan.xyz) in their Retrieval deal proposal. SP can then read the corresponding signature and bubble up to the original DID that sent the Storage Deal (see prf field). If there is a match with the original DID, then it means the Client may be authorized to access the data. The exact access control over the specific PieceCID this Client wants to fetch is described using the Capability mechanism of UCAN, see the att field. This feature provides granularity over data access control.

In order for Filecoin to stay censorship resistant, it should always be possible for SPs to accept retrieval deals from unauthorized Clients. However, retrieving data to unauthorized entities is likely to affect the SP reputation negatively, depending on context.

Advantages:

much more access control granularity over data than a private/public boolean
DID is a W3C recommendation that is becoming widely used in decentralized apps
we don't need to rely on any specific DID Method, so that would make Filecoin 100% compatible with any past, present or future form of decentralized auth (e.g: key, ENS, IPNS, Filecon, Bitcoin or Ethereum based), without any particular effort. I took the example of a hypothetical Filecoin-based DID Method just for the sake of it.
UCAN and its delegation mechanism are extremely flexible. During Storage Deal, we just say "who is the owner". Then all the delegation of rights can be done off-chain and even offline. No need to update Filecoin about new access control. During Retrieval deal proposal, Clients would simply prove which rights they have by providing cryptographic signature.
introducing DIDs in the protocol would allow us to easily provide reputation information through Verifiable Credentials (https://www.w3.org/TR/vc-data-model/)
DID provides a higher level of abstraction over Filecoin addresses, meaning users can easily manage multiple addresses, roll their private keys, and keep their financial privacy without having to worry about losing potential access control.

Drawbacks:

introducing DIDs is a lot of work and it's not clear how everything would fit together
UCANs are still work-in-progress and aren't standardized yet
there are currently no DID Method based of Filecoin, and especially no Filecoin SideTree implementation
we need to provide tools for DID Recovery: how to change the DID (data owner) recorded in Filecoin after a Storage Deal is accepted?
it requires protocol-level changes (FIP), as we need to keep track of the DID associated with the storage deal

cc @gobengo @bmann @expede you should be interested in joining this discussion. I may have made mistakes in the way DID/UCANs should be used, as I never had to opportunity to experiment with it yet, so feel free to correct me. I'll also notify Patrick Woodhead who is, I believe, interested in introducing DIDs and Verifiable Credentials for reputation within Retrieval market.

bmann commented 2 years ago

Thanks @nicobao! Yes, we are proposing to do a proof of concept that would be WNFS (our encrypted file system) end-to-end private data on Filecoin, to open up all private use cases.

At the very least, a standard way to combine DIDs, private keys, and UCANs for access control in such a way that one entity can place data, meant for another entity to retrieve.

Sidetree is not needed, and we will also be working on did:fil (or did:pkh, which is broadly used for any EOA blockchain keys).

Happy to talk more about this -- I think there are folks from DAGHouse cc @mikeal who would be interested in this.

mikeal commented 2 years ago

i personally feel a little ill-equipped to try and “define a Filecoin DID protocol” that uses UCAN, right now.

we’re doing a lot with UCAN right now, and we’re exploring transport protocols w/ it https://purrfect-tracker-45c.notion.site/fast-ptp-368da03e9c91460f9dcb3da080f439d2 and can iterate pretty quickly with one that is in production and servicing a lot of large data reads between large providers.

i want to have that experience before trying to define something this big, that cuts across so many use cases and concerns. i know the “large provider” problem pretty well, and we are still finding better ways to leverage UCANs every week to solve those problems. it’s exciting stuff, but definitely changing fast and we’re still finding best practices.

nicobao commented 2 years ago

@bmann is there a public repo for the proof-of-concept you work on?

I suppose for now we can leave auth outside of Filecoin until it appears clearer how to introduce it.

@s0nik42 would it be sufficient for the ask protocol to return a boolean indicating whether it's a public vs private endpoint?

I'm imagining something like:
1. Client sends query ask to several SPs

2. SPs respond with ask (including boolean indicating public / private)

3. Client sends data request to SP that is public and meets client's price conditions

As access control granularity is all implemented off-chain for now, the boolean you mention is fine to me. @s0nik42 what do you think?

As @s0nik42 said, it would be nice if Clients send their Filecoin address in the retrieval deal proposal.

willscott commented 2 years ago

We have a space for protocol metadata in the indexing announcements https://github.com/filecoin-project/index-provider/blob/main/metadata/graphsync_filecoinv1.ipldsch

It would be great if the indexer presence already implicitly indicates 'public' over 'private', and we can extend those advertisements with price conditions the provider would be willing to offer retrieval at.

if we can do that, then we can avoid the additional negotiation round-trip, and have a client much more likely to be able to go directly to an SP it can be successful in making a retrieval with.

dirkmc commented 2 years ago

@willscott agreed - that was going to be my next suggestion: instead of an ask that returns public/private, just don't advertise private cids

expede commented 2 years ago

UCANs are still work-in-progress and aren't standardized yet

To perhaps clarify, there is a standard at https://github.com/ucan-wg/spec, but we're still releasing new versions every few months.

we are still finding better ways to leverage UCANs every week to solve those problems

I just wanted to echo this as well. Aside from the standardization process, there's some pattern discovery happening in the community right now. We can pull a lot from the eRights and SPKI worlds, but there's lots of interesting experimentation happening.

We're also going to be exploring topic related to UCAN+Filecoin pretty heavily as part of the IPVM working group.

hannahhoward commented 2 years ago

Closing until we find a new time to finish the design.

hannahhoward commented 2 years ago

NM, I'm going to leave it open simply for disucssion for a later point, but for now our immediate needs for HTTP retrieval are resolved, so this is now an open design thread.

filecoin-project / boost

QueryAsk v2 Protocol #666

Goals

How