Open hannahhoward opened 2 years ago
Overall this looks reasonable. I don't think we need QueryKind
though, we can infer this from the Params:
In your v2 proposal you got rid of the usage, but not declaration of QueryItemStatus
, is that intentional? It does seem unnecessary to have, but I don't know what the V1 intent was for distinguishing QueryItemStatus
versus QueryResponseStatus
. They seem redundant.
If only PieceCid is provided, provide a response for the piece retrieval with available options. (http only)
Just want to flag not to hardcode this http only
limitation as part of the protocol itself. A piece is itself a tree-hashed payload, there could very well be a near future where the piece blob itself it is retrievable over graphsync/bitswap.
cc @mikeal @rvagg
Just want to flag not to hardcode this http only limitation as part of the protocol itself.
Totally agree, the ( )
were meant to capture the current state of support and not future support. We should return all available options.
I've got general concern about the query-ask in general and the retrieval/storage sucess rate. Its a bit wider than the scope of this Issue, but I think it worth discussing it here.
Storage Deals In V1, a client looking to store data is doing (lets take estuary) :
This asymmetry reduces drastically the success rate for client and SPs reputation.
Retrieval Deals
Putting all of this together :
I think the current deal-filter or a get-ask filter should be triggered during the get-ask with more parameters (same parameters as the storage-proposal). Doing so the success rate of the deal proposal should highly increase giving more satisfaction to all participants (SP and client). This point has already been discussed with the arg team about a year ago @brendalee know more about it.
Client should be able to sign a retrievalProposal to authenticate the proposal for SP to apply the correct pricing/dealmaking conditions/access control. Today the only way to do that is based on peerID and it's really not reliable.
@s0nik42
Thanks for these awesome suggestions.
The biggest software challenge with running the filters in the query phase currently is that the interface exposed to run the deal filter, at least the level of the markets software, takes an actual deal proposal. So we'd have to essentially synthesize one to run the filters or create a new DealFilterParams type struct to capture all the levers a provider might want to apply to decide whether to take a deal. And we'd also need to probably add some parameters to the ask protocol (on the storage side at least) to synthesize an accurate representation of what the deal is likely to look like (for example, is it an offline deal -- something not yet obvious in the current setup). When @dirkmc is back he can probably think through this more in depth.
I definitely agree on signing retrieval deals though it in the case of retrieval we can keep it optional for the purposes of public data
For the purpose of query ask v2, I think the forward thinking feature I can add, to avoid another breaking protocol change, is to support signatures when you send the query.
@s0nik42 would it be sufficient for the ask protocol to return a boolean indicating whether it's a public vs private endpoint?
I'm imagining something like:
I think we should consider using the concept of W3C DID (https://www.w3.org/TR/did-core/) and UCAN (https://ucan.xyz/) for auth and access control for private data.
Let me explain the use case through a user story (example usage of DID/UCAN is approximate and remains to be defined).
When a Client sends a Storage deal, the Client signs the proposal with his DID (e.g: did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ
) , that would resolve into a DID Document that looks like this:
{
"@context": "https://w3id.org/did/v1",
"id": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ",
"publicKey": [{
"id": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey",
"type": "Ed25519VerificationKey2018",
"controller": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ",
"publicKeyBase58": "B12NYF8RrR3h41TDCTJojY59usg3mbtbjnFs7Eud1Y6u"
}],
"authentication": [
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"assertionMethod": [
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"capabilityDelegation": [
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"capabilityInvocation": [
"did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#pubkey"
],
"keyAgreement": [{
"id": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ#kakey",
"type": "X25519KeyAgreementKey2019",
"controller": "did:fil:mainnet:EiD0x0JeWXQbVIpBpyeyF5FDdZN1U7enAfHnd13Qk_CYpQ",
"publicKeyBase58": "JhNWeSVLMYccCk7iopQW4guaSJTojqpMEELgSLhKwRr"
}]
}
through a Filecoin DID Method (similar to ION SideTree, here is the spec: https://github.com/decentralized-identity/sidetree)
After storage is done, Clients would add their UCAN (see https://ucan.xyz) in their Retrieval deal proposal. SP can then read the corresponding signature and bubble up to the original DID that sent the Storage Deal (see prf
field). If there is a match with the original DID, then it means the Client may be authorized to access the data. The exact access control over the specific PieceCID this Client wants to fetch is described using the Capability
mechanism of UCAN, see the att
field. This feature provides granularity over data access control.
In order for Filecoin to stay censorship resistant, it should always be possible for SPs to accept retrieval deals from unauthorized Clients. However, retrieving data to unauthorized entities is likely to affect the SP reputation negatively, depending on context.
Advantages:
Drawbacks:
cc @gobengo @bmann @expede you should be interested in joining this discussion. I may have made mistakes in the way DID/UCANs should be used, as I never had to opportunity to experiment with it yet, so feel free to correct me. I'll also notify Patrick Woodhead who is, I believe, interested in introducing DIDs and Verifiable Credentials for reputation within Retrieval market.
Thanks @nicobao! Yes, we are proposing to do a proof of concept that would be WNFS (our encrypted file system) end-to-end private data on Filecoin, to open up all private use cases.
At the very least, a standard way to combine DIDs, private keys, and UCANs for access control in such a way that one entity can place data, meant for another entity to retrieve.
Sidetree is not needed, and we will also be working on did:fil (or did:pkh, which is broadly used for any EOA blockchain keys).
Happy to talk more about this -- I think there are folks from DAGHouse cc @mikeal who would be interested in this.
i personally feel a little ill-equipped to try and “define a Filecoin DID protocol” that uses UCAN, right now.
we’re doing a lot with UCAN right now, and we’re exploring transport protocols w/ it https://purrfect-tracker-45c.notion.site/fast-ptp-368da03e9c91460f9dcb3da080f439d2 and can iterate pretty quickly with one that is in production and servicing a lot of large data reads between large providers.
i want to have that experience before trying to define something this big, that cuts across so many use cases and concerns. i know the “large provider” problem pretty well, and we are still finding better ways to leverage UCANs every week to solve those problems. it’s exciting stuff, but definitely changing fast and we’re still finding best practices.
@bmann is there a public repo for the proof-of-concept you work on?
I suppose for now we can leave auth outside of Filecoin until it appears clearer how to introduce it.
@s0nik42 would it be sufficient for the ask protocol to return a boolean indicating whether it's a public vs private endpoint?
I'm imagining something like:
1. Client sends query ask to several SPs 2. SPs respond with ask (including boolean indicating public / private) 3. Client sends data request to SP that is public and meets client's price conditions
As access control granularity is all implemented off-chain for now, the boolean you mention is fine to me. @s0nik42 what do you think?
As @s0nik42 said, it would be nice if Clients send their Filecoin address in the retrieval deal proposal.
We have a space for protocol metadata in the indexing announcements https://github.com/filecoin-project/index-provider/blob/main/metadata/graphsync_filecoinv1.ipldsch
It would be great if the indexer presence already implicitly indicates 'public' over 'private', and we can extend those advertisements with price conditions the provider would be willing to offer retrieval at.
if we can do that, then we can avoid the additional negotiation round-trip, and have a client much more likely to be able to go directly to an SP it can be successful in making a retrieval with.
@willscott agreed - that was going to be my next suggestion: instead of an ask that returns public/private, just don't advertise private cids
UCANs are still work-in-progress and aren't standardized yet
To perhaps clarify, there is a standard at https://github.com/ucan-wg/spec, but we're still releasing new versions every few months.
we are still finding better ways to leverage UCANs every week to solve those problems
I just wanted to echo this as well. Aside from the standardization process, there's some pattern discovery happening in the community right now. We can pull a lot from the eRights and SPKI worlds, but there's lots of interesting experimentation happening.
We're also going to be exploring topic related to UCAN+Filecoin pretty heavily as part of the IPVM working group.
Closing until we find a new time to finish the design.
NM, I'm going to leave it open simply for disucssion for a later point, but for now our immediate needs for HTTP retrieval are resolved, so this is now an open design thread.
Goals
Enable discoverability of HTTP retrieval support
The primary changes for v2 protocol are:
Query:
Response:
For now the newly design protocol makes all additional information protocol specific
How
The schema for the QueryAsk v1 request / response (in IPLD schema -- it's encoded as DAG CBOR) is as follows:
The proposed v2 schema is: