Certified variables on the Candid layer?

nomeata commented 4 years ago

Certified variables

The system will provide a form of certified variables. Very roughly the system interface will be:

In update methods, the canister can pass a short blob (typically the root hash of a merkle tree that the canister maintains) to the system.
In query methods, the canister can get an opaque certificate (“signature”) from the system, and pass it to the caller.

The Problem

If we don't do anything to help them, I expect that none of our target audience developers will be able to use this functionality (or worse, use it insecurely). I wrote up what this would entail for developers if you are curious.

So we (as the platform) need to do something. And because this is a question of interoperability, it cannot be solved just within one of the langauges, but has to be solved on the Candid layer or below. Under the assumption that the (rough) system interface is what I outlined above (and not, say, proper certified queries, which would be transparent to upper layers, at the expense of being slower, more expensive and more work to provide), the Candid layer is the only layer left to solve this in.

I am discussing this here, and not in the candid repo, because some of this might be secret sauce.

The high level goal

Consider a serivce with interface

service {
  getBalance(user_name : String) : Nat query;
}

As it stands right now, this would be an uncertified query. Bad for a tamper proof platform.

The goal should be that the developer changes the signature to, say

service {
  getBalance(user_name : String) : Nat certified query ;
}

and adding that qualifier is all that’s necessary for clients of this service to use this method in a certified way.

A possible interface

The goal is now to map the query method name and its parameters onto the tree of the state tree.

The certified qualifier has the following restrictions and effect:

The method must be a query
All parameters of the method have primitive type (Can be relaxed to: have a type with canonical hash value function)

The underlying methods interface can be explained by the following desugaring:

meth(t1,… tn) : tr certified query
==
meth(t1,… tn) : { content : ?blob; certificate : ?blob, witness : ?blob } query

The content is the Candid-encoded response of type tr. (This nesting of Candid in Candid is necessary to pre-compute the hash of the return value.)
When run in replicated state, certificate and witness are null.
Otherwise, certificate is the opaque blob returned by the system, which certified the canister’s cetified state variables tree root hash.
The witness connects the hash of the content blob` to the tree root hash via a path of the form
```
/certified_variables/<meth>/<H(a1)>/…/<H(an)>
```
where H(ai) is the (canonical) hash of the i-th argument to the method.

(Variant: The path in the witness may be allowed to be shorter; this way parameter width subtyping still works.)
When no value is available for the given arguments, content is null, and the witness is a negative witness that proves the absence of that value.
Rejections are not certified

See https://docs.dfinity.systems/public/#certification for more on certification (what’s called “witness” here is the “hashtree” there).

Effect on clients

The Candid-and-IC-handling client library (userlib.js, the rust agent, rust canisters, the Motoko RTS) has now all the information to verify the witness, and transparently to the application return a trustworthy reply, or report failure. This is nice.

Effect on services

This is more severe. Some options are:

Canisters just have to implement the above interface by hand. Not nice.
Environments like Motoko auto-generate these certified queries. Shower-thought-level design:
```
actor {
 certified var time : Nat;
 certified var balance : HashMap<Text, Nat>;
}
```
would yield an interface like that
```
service {
  time() : Nat certified query;
  balance(Text) : Nat certified query;
}
```
Assigning to such a certified var will update the hash tree, prepare the responses etc. under the hood.

This would be somewhat nicer if Motoko had built-in dictionaries.

Or maybe we need a somewhat magical CertifiedMap type.

The actual time() and balance() query methods would be generated automatically by the compiler.
- A variant of that might allow a canister to hook into the generated queries, to do access control (dynamic responses are of course not possible)

Alternative: Just a convention

In this alternatived design, the certified qualifier does not become part of Candid. Instead, certifying queries relies on the convention that certifiable queries have type

   meth : (t1,… tn) -> { content : ?tr; certificate : ?blob, witness : ?blob }

where tr is a primitive Candid value (so that it can be hashed).

This variant requires less components to change, can be used implemented more quickly, but is less convenient for developers.

Improvements: Canonical Candid

If we could define a “canonical hash” for Candid values, we’d have to apply less restrictions to values being primitive.

Conclusion

Is this feasiable? Desireable? Do we have an alternative (besides doing nothing)?

nikclayton-dfinity commented 4 years ago

Re the choice of defaults -- has flipping the default on its head been considered, and require that the interface specify it should be an uncertified query, with certified queries being the default?

Our field is littered with examples of protocols that specified insecure defaults and had to go through a long and painful process of effectively switching the default to be secure. So maybe we should jump the gun on that whole process now?

Possibly the keyword shouldn't be uncertified (although that closely maps to the intention). Maybe untrusted would be better?

Yielding something like:

service {
  getBalance(user_name : String) : Nat query ;
  getProbableBalance(user_name : String) : Nat untrusted query ;
}

Or perhaps ... Nat untrusted-query ; ?

With the some idea applied to the variable declarations -- be certified by default, make the user make an explicit design choice that a variable is uncertified / untrusted.

?

nomeata commented 4 years ago

with certified queries being the default?

There is no such thing as a “certified query” in our platform at the moment; this PR is considering the introduction of this concept at the candid level.

I fully agree that safe-by-default is the right choice in generaly, and probably here too. But it is a question we can address only once we know that we even want to add the concept as outlined here.

crusso commented 4 years ago

Since queries have access to caller id, shouldn't that be included in the argument hash?

nomeata commented 4 years ago

Since queries have access to caller id, shouldn't that be included in the argument hash?

This is tricky: Many applications of certified queries (e.g. serving an index.html, or a highscore) will return the same no matter what the caller is. If you include the caller in the witness path (i.e. the argument hash), you’d have to prepare signed responses for every possible caller. That is not feasible in general.

I was wondering about a more elaborate scheme where the caller is part of the withness path, but the path can also indicate “any value”, i.e. a concrete path to certify a query my_blance() could be /certified_variables/my_balance/claudio (where claudio is you id), but a query highscore(game_id: nat) might return something with a path /certified_variables/highscore/*/5 (where of course the * is encoded in a way that doesn’t clash with a literal *; the 5 might be a game id).

This would complicate the server-side story more, so I initially leaned towards solving the 80% case; if you need to certify queries that include the caller you can still produce and validate certificates manually.

Note that restricting access via the caller still works, as it does't affect the result!

Maybe the interface should already support such wildcards (which are also useful to indicate that some normal argument is ignored), even if motoko doesn’t initially expose that ability. The complexity on the agent side is reasonable.

rossberg commented 4 years ago

Thanks for writing this up. It makes sense to me as far as the current design around certified variables makes sense. But I agree with the undercurrent that this all is highly unpleasant from a programming model perspective.

Besides the leakiness and complexity of the abstraction, I also see the deeper problem you alluded to elsewhere, namely that the very idea of certified variable as a solution to certified queries confuses the notions of state and query. And with that, it creates a strong incentive for devs to expose state more or less directly, instead of designing proper interface layers.

The approach also seems incompatible with delegation, or am I missing something?

nomeata commented 4 years ago

The approach also seems incompatible with delegation, or am I missing something?

with delegation you mean “storing the certified data now at some other canister B, and forwarding queries to that canister without the clients needing to know”? yes. Unless one extend the language of “witnesses” to include such indirection; then you can return both a certificate from B, and a certified statement that certain queries may be signed by B. All very complicated.

nomeata commented 4 years ago

Updated this proposal with an alternative of using a convention on top of Candid. Might be easier to start with (although less convenient, and maybe harder to implement in Motoko user code, at least not without access to candid serialization directly).

rossberg commented 4 years ago

Is there a reason why the convention approach does not include the witness field?

crusso commented 4 years ago

I'm wondering if a lot of the complexity alluded to here https://docs.google.com/document/d/1vwWbCWGJ0n-aGq362gbRev_Apfkh5Mp3Uv4h7Lby24A/edit?usp=sharing couldn't be hidden by library design in Motoko and JS.

If we had a Merkle tree indexed by canonical blobs, storing blobs, and provided user exposes serialization primitives to construct the restricted indices, and unrestricted content, couldn't the certified query methods just boil down to boilerplate code?

The JS code for verifying results might also be dealt with similarly, so users don't have to repeat the error-prone logic but could just apply a higher-order function to do the certification of the query.

But maybe I'm smoking crack again.

nomeata commented 4 years ago

Is there a reason why the convention approach does not include the witness field?

No, my bad, will fix

nomeata commented 4 years ago

You are not smoking crack, and certainly some of it can be hidden (stuff related to serialization etc.) But you still have to re-structure your appliation logic to do the serialization ahead of time, keep it up-to-date, etc., that’s the main issue alluded to.

crusso commented 4 years ago

No, I perfectly agree that this is awkward and not as expressive as full certified queries. I'm just wondering if we can make it simpler to use without building stuff into the language, especially since it seems to have fairly limited utility.

nomeata commented 3 years ago

The present proposal is too inflexible for interesting services; I have a new proposal, Universal Query Certificates that addresses that by including the validation code (which is, after all, service-specific) in the certificate.

skilesare commented 3 years ago

Can we know more about this Universal Query Certificates and what the timeline is? We're trying to figure this out over at https://forum.dfinity.org/t/recommended-usage-of-certifieddata/4370/13 and it doesn't seem very easy to do in Motoko right now.

nomeata commented 3 years ago

Can we know more about this Universal Query Certificates

I don’t have access to that file any more, since it is an internal google doc.

and what the timeline is?

None, this was just a crazy idea (and probably not a good one).

dfinity / motoko