Open nomeata opened 4 years ago
Re the choice of defaults -- has flipping the default on its head been considered, and require that the interface specify it should be an uncertified query, with certified queries being the default?
Our field is littered with examples of protocols that specified insecure defaults and had to go through a long and painful process of effectively switching the default to be secure. So maybe we should jump the gun on that whole process now?
Possibly the keyword shouldn't be uncertified
(although that closely maps to the intention). Maybe untrusted
would be better?
Yielding something like:
service {
getBalance(user_name : String) : Nat query ;
getProbableBalance(user_name : String) : Nat untrusted query ;
}
Or perhaps ... Nat untrusted-query ;
?
With the some idea applied to the variable declarations -- be certified by default, make the user make an explicit design choice that a variable is uncertified / untrusted.
?
with certified queries being the default?
There is no such thing as a “certified query” in our platform at the moment; this PR is considering the introduction of this concept at the candid level.
I fully agree that safe-by-default is the right choice in generaly, and probably here too. But it is a question we can address only once we know that we even want to add the concept as outlined here.
Since queries have access to caller id, shouldn't that be included in the argument hash?
Since queries have access to caller id, shouldn't that be included in the argument hash?
This is tricky: Many applications of certified queries (e.g. serving an index.html
, or a highscore) will return the same no matter what the caller is. If you include the caller in the witness path (i.e. the argument hash), you’d have to prepare signed responses for every possible caller. That is not feasible in general.
I was wondering about a more elaborate scheme where the caller is part of the withness path, but the path can also indicate “any value”, i.e. a concrete path to certify a query my_blance()
could be /certified_variables/my_balance/claudio
(where claudio
is you id), but a query highscore(game_id: nat)
might return something with a path /certified_variables/highscore/*/5
(where of course the *
is encoded in a way that doesn’t clash with a literal *
; the 5
might be a game id).
This would complicate the server-side story more, so I initially leaned towards solving the 80% case; if you need to certify queries that include the caller you can still produce and validate certificates manually.
Note that restricting access via the caller still works, as it does't affect the result!
Maybe the interface should already support such wildcards (which are also useful to indicate that some normal argument is ignored), even if motoko doesn’t initially expose that ability. The complexity on the agent side is reasonable.
Thanks for writing this up. It makes sense to me as far as the current design around certified variables makes sense. But I agree with the undercurrent that this all is highly unpleasant from a programming model perspective.
Besides the leakiness and complexity of the abstraction, I also see the deeper problem you alluded to elsewhere, namely that the very idea of certified variable as a solution to certified queries confuses the notions of state and query. And with that, it creates a strong incentive for devs to expose state more or less directly, instead of designing proper interface layers.
The approach also seems incompatible with delegation, or am I missing something?
The approach also seems incompatible with delegation, or am I missing something?
with delegation you mean “storing the certified data now at some other canister B, and forwarding queries to that canister without the clients needing to know”? yes. Unless one extend the language of “witnesses” to include such indirection; then you can return both a certificate from B, and a certified statement that certain queries may be signed by B. All very complicated.
Updated this proposal with an alternative of using a convention on top of Candid. Might be easier to start with (although less convenient, and maybe harder to implement in Motoko user code, at least not without access to candid serialization directly).
Is there a reason why the convention approach does not include the witness field?
I'm wondering if a lot of the complexity alluded to here https://docs.google.com/document/d/1vwWbCWGJ0n-aGq362gbRev_Apfkh5Mp3Uv4h7Lby24A/edit?usp=sharing couldn't be hidden by library design in Motoko and JS.
If we had a Merkle tree indexed by canonical blobs, storing blobs, and provided user exposes serialization primitives to construct the restricted indices, and unrestricted content, couldn't the certified query methods just boil down to boilerplate code?
The JS code for verifying results might also be dealt with similarly, so users don't have to repeat the error-prone logic but could just apply a higher-order function to do the certification of the query.
But maybe I'm smoking crack again.
Is there a reason why the convention approach does not include the witness field?
No, my bad, will fix
You are not smoking crack, and certainly some of it can be hidden (stuff related to serialization etc.) But you still have to re-structure your appliation logic to do the serialization ahead of time, keep it up-to-date, etc., that’s the main issue alluded to.
No, I perfectly agree that this is awkward and not as expressive as full certified queries. I'm just wondering if we can make it simpler to use without building stuff into the language, especially since it seems to have fairly limited utility.
The present proposal is too inflexible for interesting services; I have a new proposal, Universal Query Certificates that addresses that by including the validation code (which is, after all, service-specific) in the certificate.
Can we know more about this Universal Query Certificates and what the timeline is? We're trying to figure this out over at https://forum.dfinity.org/t/recommended-usage-of-certifieddata/4370/13 and it doesn't seem very easy to do in Motoko right now.
Can we know more about this Universal Query Certificates
I don’t have access to that file any more, since it is an internal google doc.
and what the timeline is?
None, this was just a crazy idea (and probably not a good one).
Certified variables
The system will provide a form of certified variables. Very roughly the system interface will be:
update
methods, the canister can pass a short blob (typically the root hash of a merkle tree that the canister maintains) to the system.query
methods, the canister can get an opaque certificate (“signature”) from the system, and pass it to the caller.The Problem
If we don't do anything to help them, I expect that none of our target audience developers will be able to use this functionality (or worse, use it insecurely). I wrote up what this would entail for developers if you are curious.
So we (as the platform) need to do something. And because this is a question of interoperability, it cannot be solved just within one of the langauges, but has to be solved on the Candid layer or below. Under the assumption that the (rough) system interface is what I outlined above (and not, say, proper certified queries, which would be transparent to upper layers, at the expense of being slower, more expensive and more work to provide), the Candid layer is the only layer left to solve this in.
I am discussing this here, and not in the candid repo, because some of this might be secret sauce.
The high level goal
Consider a serivce with interface
As it stands right now, this would be an uncertified query. Bad for a tamper proof platform.
The goal should be that the developer changes the signature to, say
and adding that qualifier is all that’s necessary for clients of this service to use this method in a certified way.
A possible interface
The goal is now to map the query method name and its parameters onto the tree of the state tree.
The
certified
qualifier has the following restrictions and effect:query
content
is the Candid-encoded response of typetr
. (This nesting of Candid in Candid is necessary to pre-compute the hash of the return value.)certificate
andwitness
arenull
.certificate
is the opaque blob returned by the system, which certified the canister’s cetified state variables tree root hash.The
witness
connects the hash of thecontent
blob` to the tree root hash via a path of the formwhere
H(ai)
is the (canonical) hash of the i-th argument to the method.(Variant: The path in the witness may be allowed to be shorter; this way parameter width subtyping still works.)
content
isnull
, and thewitness
is a negative witness that proves the absence of that value.See https://docs.dfinity.systems/public/#certification for more on certification (what’s called “witness” here is the “hashtree” there).
Effect on clients
The Candid-and-IC-handling client library (userlib.js, the rust agent, rust canisters, the Motoko RTS) has now all the information to verify the witness, and transparently to the application return a trustworthy reply, or report failure. This is nice.
Effect on services
This is more severe. Some options are:
Canisters just have to implement the above interface by hand. Not nice.
Environments like Motoko auto-generate these certified queries. Shower-thought-level design:
would yield an interface like that
Assigning to such a
certified var
will update the hash tree, prepare the responses etc. under the hood.This would be somewhat nicer if Motoko had built-in dictionaries.
Or maybe we need a somewhat magical
CertifiedMap
type.The actual
time()
andbalance()
query methods would be generated automatically by the compiler.Alternative: Just a convention
In this alternatived design, the
certified
qualifier does not become part of Candid. Instead, certifying queries relies on the convention that certifiable queries have typewhere
tr
is a primitive Candid value (so that it can be hashed).This variant requires less components to change, can be used implemented more quickly, but is less convenient for developers.
Improvements: Canonical Candid
If we could define a “canonical hash” for Candid values, we’d have to apply less restrictions to values being primitive.
Conclusion
Is this feasiable? Desireable? Do we have an alternative (besides doing nothing)?