Open crusso opened 3 years ago
@nomeata, @rossberg, @kritzcreek whaddya think?
Thins api inspection is inherently tied to a given method, and can (and should) be able to inspect the argument. So logically, and thus maybe syntactically, it ought to be some kind of annotation or code block that comes between an exported shared function’s parameter list and the body? But I don’t really have any good idea how to expose that in a seamless way that isn’t completely ad-hoc.
Well, we could use some sort of pre_condition, but from my cursory reading of spec it seemed to apply to all methods. Perhaps I've misunderstood.
Ok, so it looks like inspect_message can look at callee, caller and argument (only) so I guess a boolean guard on the argument pattern might work nicely (as a sort of precondition). Absent guards on patterns, we could just allow non-exhaustive patterns in shared functions and reject in the pattern match fail continuation somehow. We could even execute any impure guard safely, assuming message-reject implies rollback.
shared (msg) func launch_missiles("USSR") when (msg.caller == Reagan && defcon = 5) : async None { ... };
@nomeata says: I would not tie it to pattern match failure. Note that this only runs for ingress messages, not inter canister messages. And it can't be used for access control, merely ingress cost control
I assumed it could be used for access control coz the spec says you can access msg_caller_size
and _copy
(flag F
). Is that a bug or is something else preventing that.
But if it's only run for ingress, not all messages, then I agree.
Yes, the purpose of this call is (only) “do you want to pay for this message to be ingressed”. A bit like a collect call. Inter-canister calls are paid for by the caller (at least the transport, not the execution).
Will this be prioritized soon? It seems rather important for preventing DDoS attacks.
Hmm, perhaps we should tackle this again.
Another option is an optional system method, whose input, a variant, depends on the types of the other shared functions:
actor {
public func inc() : async () { ... };
public func set(n : Nat) : async () { ... };
public query read() : async () { ... };
system func inspect(msg : {#inc; #set : Nat; #read}) : Bool {
switch (msg) {
case (#inc) : false;
case (#read) : true;
case (#set n) : false;
}
}
};
KInda ugly though, and it's not clear to me if the caller should be included or not - and won't work with generics if we every go there, unless we also add GADTs...
@nomeata, @rossberg, @chenyan-dfinity @ggreif thoughts?
Maybe this can be part of the shared context:
public shared({caller, inspect = func (msg:Blob) : Bool { ... } }) func inc() : async () { ... };
doesn’t that go the wrong way? The thing after shared(…)
is a pattern to get information out of the system, not an expression that specifies a value.
Joachim's suggestion above was adding an optional boolean expression between the body args and body, which saves you repeating the argument type as in my suggestion. Something like
public shared {caller} func (n : nat) {n > 0 and valid(caller)}
{
...body...
}
that's concise, but looks a lot like a pre-condition that gets executed on every call, not just for ingress message. I'm inclined to isolate this hack elsewhere.
Alternatively, anticipating verification, maybe we could go for something like:
public shared {caller} func (n : nat)
expect <filter>
requires <pre-condition>
ensures (r : typ) <post-condition>
{
...body...
}
where expect
is used for inspection and the requires
and ensures
clauses are what we might use in future for pre- and (dependent) post-conditions.
but looks a lot like a pre-condition that gets executed on every call, not just for ingress message.
We can of course run it for all calls, by running it again in the actual message.
func (n : nat) {n > 0 and valid(caller)}
This assumes we decode the message first. But inspect_message
was run before deserialization to prevent the decoding cost.
According to the spec, inspect_message is allowed to access the message payload, but, of course, doesn't have to. I guess one could delay that by turning each (non-trivial) argument into a thunk, but is that worth it? Then you can filter by message name, and optionally by content.
actor {
public func inc() : async () { ... };
public func set(n : Nat) : async () { ... };
public query read() : async () { ... };
system func inspect(msg : {#inc; #set : () -> Nat; #read}) : Bool {
switch (msg) {
case (#inc) : false;
case (#read) : true;
case (#set f) : f() > 0;
}
}
};
Or we give access to the raw blob
and a variant of functions for decoding the blob on demand.
actor {
public func inc() : async () { ... };
public func set(n : Nat) : async () { ... };
public query read() : async () { ... };
system func inspect(dmsg : {#inc; #set : (data : Blob) -> Nat; #read}, data : Blob) : Bool {
switch (msg) {
case (#inc) : false;
case (#read) : true;
case (#set f) : f(data) > 0;
}
}
};
How ugly do we want to get?
I think inspect doesn't need to decode the message. The pre-condition after decoding can be done in the main function body, which applies to both ingress and inter-canister messages. In the inspect_message
, users get a Blob
and it's up to them to decide how to inspect that blob.
I think lazy decoding is what I'd expect from a high level language here. It's just an optimization after all.
I guess there's also a question about what to do about the hidden entry points (tmp_interface hack (true?), __async_helper (false?), and the stable_variable_foot_print (false?)).
This is getting gross...
Do we have equality on shared functions refs? Then we could pass the method as a reference? Ah, but we can't really type that, nevermind.
Indeed gross.
I tend to think a central entry point is the wrong thing here, and some form of annotation is more suitable for a high level language.
Yeah, I was thinking the central entry point would be useful is you only want to restrict by caller, because then you could just type the msg bit as Any.
system func inspect(caller : Principal, _ : Any) : Bool { valid(caller) };
Also, since this is gross, isolating it in one gross method seemed attractive...
I may have a go this week and, unless I hear otherwise, will implement something along these lines:
system func inspect(caller : Principal, msg : {#inc: Blob -> (); #set : Blob -> Nat; #read: Blob -> ()}, arg: Blob) : Bool
Which give you the option to check the caller, the message name, the blob and the decoded message payload, decoded on demand. That seems reasonably flexible.
Not sure about whether to include the query variants (e.g. read
) or not - IC inspect_message doesn't apply to queries yet, so the variant for read will never arise, but I imagine if the platform moves to certified queries in future, then it might.
As a stretch goal, I might be able to omit the fragile type annotation on the arguments since the type is in principle inferable from the enclosing actor type.
@nomeata still opposed in general?
Or we could aim much, much lower and just provide:
system func inspect(caller : Principal, msg : Text, arg: Blob) : Bool
and rely on other facilities (to_candid/from_candid) and the user to get things right when necessary.
Opposed is too strong. The proposed interface feels a bit out of place for a more integrated language like Motoko, and would expect some higher level annotation-style interface. But I don't have concrete suggestions, so this shouldn't block anyone. But for that reason I'm mildly in favor of the more barebones variant and aim lower until we have a convincing story.
system func inspect(caller : Principal, msg : {#inc: Blob -> (); #set : Blob -> Nat; #read: Blob -> ()}, arg: Blob) : Bool
I actually don't see the name in there. What parameter has it?
Not sure about whether to include the query variants (e.g.
read
) or not - IC inspect_message doesn't apply to queries yet, so the variant for read will never arise, but I imagine if the platform moves to certified queries in the future, then it might.
I'd plan for success. I can already see a need to reject a query method with an upgrade request(maybe this is misplaced due to http_request and http_request_update already existing elsewhere.
and rely on other facilities (to_candid/from_candid) and the user to get things right when necessary.
Also, I'm guessing the to_candid/from_candid along with this? That would be super. Would those functions work with cal_raw? Now that we have it we are a bit blocked for all we want to do with it by the lack of a cbor and candid decoder. I know off topic for this thread, but super excited for those features.
I can already see a need to reject a query method with an upgrade request
Oh, good point, that should already be the case that the inspect hook is called for all update calls, even those that go to query methods, after all the purpose is to ask the canister if it wanta to pay for ingressing the call. So yes, do include queries.
system func inspect(caller : Principal, msg : {#inc: Blob -> (); #set : Blob -> Nat; #read: Blob -> ()}, arg: Blob) : Bool
I actually don't see the name in there. What parameter has it?
The name is encoded in the variant tag, which gives you access, should you want it, to the strongly typed decoder of the blob. Bit subtle but more strongly typed than just using a argument and user (mis-)supplied to/from_candid invocations.
Not sure about whether to include the query variants (e.g.
read
) or not - IC inspect_message doesn't apply to queries yet, so the variant for read will never arise, but I imagine if the platform moves to certified queries in the future, then it might.I'd plan for success. I can already see a need to reject a query method with an upgrade request(maybe this is misplaced due to http_request and http_request_update already existing elsewhere.
and rely on other facilities (to_candid/from_candid) and the user to get things right when necessary.
Also, I'm guessing the to_candid/from_candid along with this? That would be super. Would those functions work with call_raw? Now that we have it we are a bit blocked for all we want to do with it by the lack of a cbor and candid decoder. I know off topic for this thread, but super excited for those features.
I still don't see the need for a cbor decoder, can you explain why that is needed for full call_raw? I was under the assumption that candid coding is enough.
Can you explain what inc, set, and read mean? I don't see any mention of them in the interface spec.
Wouldn't it be better to return { #set : () -> Nat , …}
, i.e. a closure or thunk, so that one can't pass the wrong blob to it, while still delaying the decoding?
I still don't see the need for a cbor decoder, can you explain why that is needed for full call_raw? I was under the assumption that candid coding is enough.
In a wallet that is created before a service exists I may want a multisig function where the users can view the payload they are approving. If I don't know the candid type I'm guessing I can't decode it? Maybe this is a bad assumption.
The idea here is that one user would supply that they want to call canister xyz with function future_service(#no_way_you_could_know_thist_variant_is_a("string": Text)). In call_raw I just get a blob....I actually don't know if this before or after is decoded via cbor, so maybe cbor isn't an issue? I'd want to deserialize this and send down to a client application the details of what was attempting to be called. (ie future_service( x = #no_way_you_could_know_thist_variant_is_a("string": Text)).
Maybe the better way to do this is for the wallet client to be able to take in candid and just get the blob and decode with the agent. But then again, if I have an automated third-party canister that may want to approve these things if they are on a whitelist, that canister would need to know the details and parse them. Thus I think that it makes sense for a motoko program to beable to reason about candid and (maybe) cbor.
Can you explain what inc, set, and read mean? I don't see any mention of them in the interface spec.
Those are just method names referring to an earlier, sketched example here: https://github.com/dfinity/motoko/issues/2528#issuecomment-1020005395
Wouldn't it be better to return
{ #set : () -> Nat , …}
, i.e. a closure or thunk, so that one can't pass the wrong blob to it, while still delaying the decoding?
Yeah, I had it that way originally, but thought this way was more suggestive to the user (and avoids the need to allocate a closure - but that's a trivial concern. Happy to switch back to ()
.
I still don't see the need for a cbor decoder, can you explain why that is needed for full call_raw? I was under the assumption that candid coding is enough.
In a wallet that is created before a service exists I may want a multisig function where the users can view the payload they are approving. If I don't know the candid type I'm guessing I can't decode it? Maybe this is a bad assumption.
Candid is self-describing, so you can validate and decode without having an expected type to decode to. However, in Motoko, your decoder would probably need to decode to some untyped AST of values, I guess, to process the content. Doable in principle, even in user-space ( I think), but we don't have any dedicated support for that now.
The idea here is that one user would supply that they want to call canister xyz with function future_service(#no_way_you_could_know_thist_variant_is_a("string": Text)). In call_raw I just get a blob....I actually don't know if this before or after is decoded via cbor, so maybe cbor isn't an issue? I'd want to deserialize this and send down to a client application the details of what was attempting to be called. (ie future_service( x = #no_way_you_could_know_thist_variant_is_a("string": Text)).
Maybe the better way to do this is for the wallet client to be able to take in candid and just get the blob and decode with the agent. But then again, if I have an automated third-party canister that may want to approve these things if they are on a whitelist, that canister would need to know the details and parse them. Thus I think that it makes sense for a motoko program to beable to reason about candid and (maybe) cbor.
I'm almost certain the blobs seen by call_raw will be always be candid encoded, unless a user bypasses candid. I imagine the CBOR stuff only comes in when talking the http interface and never reaches canisters themselves. Perhaps @nomeata can correct me here if wrong. So there would be no need for Motoko to speak CBOR unless it's trying to do something really bizarre like issue an http request to a boundary node, pretending to be an external client.
No need to correct, it's precisely as you say.
So there would be no need for Motoko to speak CBOR unless it's trying to do something really bizarre like issue an http request to a boundary node, pretending to be an external client.
Wonderful news! I should really edit the bounty for this then. We don't need CBOR, but perhaps we need an AST parser for candid? Any idea what that would look like or can you point me in the right direction?
@crusso
I thought the call
API for ExperimentalInternetComputer was pretty nice.
public let call : (canister : Principal, name : Text, data : Blob) -> async (reply : Blob) = Prim.call_raw;
Would it be possible to do something like this, but make the argument a record so that changes are more easily backwards compatible? I would like to reject requests on a more fine grained level based on the specific name of the publicly exposed function being called.
type InspectableFunctionCall = {
caller : Principal;
name : Text; // name of the function (why is this a variant and not type Text in your examples?)
arg : Blob; // would we be able to reject here based on the size of the blob with an incoming request? For example maliciously sending a 1MB name to a `getByName(name: Text)` or `setName(name: Text)` endpoint.
requestLimit: ?Nat; // throttle a principal if they reach a request limit (nice to have, not sure if this request limit per principal is directly exposable?)
};
system func inspect(call: InspectableFunctionCall): Bool {
if (Principal.isAnonymous(call.caller)) { return false; };
switch(inspectable.name) {
// ...more fine grained permission/rejection for each endpoint
}
}
@crusso I thought the
call
API for ExperimentalInternetComputer was pretty nice.public let call : (canister : Principal, name : Text, data : Blob) -> async (reply : Blob) = Prim.call_raw;
Would it be possible to do something like this, but make the argument a record so that changes are more easily backwards compatible? I would like to reject requests on a more fine grained level based on the specific name of the publicly exposed function being called.
type InspectableFunctionCall = { caller : Principal; name : Text; // name of the function (why is this a variant and not type Text in your examples?) arg : Blob; // would we be able to reject here based on the size of the blob with an incoming request? For example maliciously sending a 1MB name to a `getByName(name: Text)` or `setName(name: Text)` endpoint. requestLimit: ?Nat; // throttle a principal if they reach a request limit (nice to have, not sure if this request limit per principal is directly exposable?) }; system func inspect(call: InspectableFunctionCall): Bool { if (Principal.isAnonymous(call.caller)) { return false; }; switch(inspectable.name) { // ...more fine grained permission/rejection for each endpoint } }
Funnily enough, I didn't see you message, but have actually made the argument a record, precisely so we can add the textual method name and other fields in future. PR #3210
The message is encoded as a variant so you can access its payload in a strongly typed way. Since each method has its own argument type, each variant gets its own decoding for that argument (as a function, to avoid the cost of decoding when unnecessary).
You actually have access to all of the actor state in inspect_message, but you can not update it, so I don't know how you would manually count requests or otherwise detect and track malicious callers. I guess the actual message could bump the count and inspect message decline when it gets too high.
@ByronBecker feel free to comment on PR #3210 - especially the .adoc files that document it (unless you care about the code)
https://docs.dfinity.systems/public#system-api-inspect-message