dfinity / motoko

Simple high-level language for writing Internet Computer canisters
Apache License 2.0
517 stars 97 forks source link

Feature: support system method `inspect-message` #2528

Open crusso opened 3 years ago

crusso commented 3 years ago

https://docs.dfinity.systems/public#system-api-inspect-message

crusso commented 3 years ago

@nomeata, @rossberg, @kritzcreek whaddya think?

nomeata commented 3 years ago

Thins api inspection is inherently tied to a given method, and can (and should) be able to inspect the argument. So logically, and thus maybe syntactically, it ought to be some kind of annotation or code block that comes between an exported shared function’s parameter list and the body? But I don’t really have any good idea how to expose that in a seamless way that isn’t completely ad-hoc.

crusso commented 3 years ago

Well, we could use some sort of pre_condition, but from my cursory reading of spec it seemed to apply to all methods. Perhaps I've misunderstood.

crusso commented 3 years ago

Ok, so it looks like inspect_message can look at callee, caller and argument (only) so I guess a boolean guard on the argument pattern might work nicely (as a sort of precondition). Absent guards on patterns, we could just allow non-exhaustive patterns in shared functions and reject in the pattern match fail continuation somehow. We could even execute any impure guard safely, assuming message-reject implies rollback.

shared (msg) func launch_missiles("USSR") when (msg.caller == Reagan  && defcon = 5)  : async None { ... };
crusso commented 3 years ago

@nomeata says: I would not tie it to pattern match failure. Note that this only runs for ingress messages, not inter canister messages. And it can't be used for access control, merely ingress cost control

crusso commented 3 years ago

I assumed it could be used for access control coz the spec says you can access msg_caller_size and _copy (flag F). Is that a bug or is something else preventing that.

But if it's only run for ingress, not all messages, then I agree.

nomeata commented 3 years ago

Yes, the purpose of this call is (only) “do you want to pay for this message to be ingressed”. A bit like a collect call. Inter-canister calls are paid for by the caller (at least the transport, not the execution).

jzxchiang1 commented 2 years ago

Will this be prioritized soon? It seems rather important for preventing DDoS attacks.

crusso commented 2 years ago

Hmm, perhaps we should tackle this again.

Another option is an optional system method, whose input, a variant, depends on the types of the other shared functions:

actor {

   public func inc() : async () { ... };
   public func set(n : Nat) : async () { ... };
   public query read() : async () { ... };

   system func inspect(msg : {#inc; #set : Nat; #read}) : Bool {
      switch (msg) { 
         case (#inc) : false;
         case (#read) : true;
         case (#set n) : false;
      }
   }
};

KInda ugly though, and it's not clear to me if the caller should be included or not - and won't work with generics if we every go there, unless we also add GADTs...

@nomeata, @rossberg, @chenyan-dfinity @ggreif thoughts?

chenyan-dfinity commented 2 years ago

Maybe this can be part of the shared context:

public shared({caller, inspect = func (msg:Blob) : Bool { ... } }) func inc() : async () { ... };
nomeata commented 2 years ago

doesn’t that go the wrong way? The thing after shared(…) is a pattern to get information out of the system, not an expression that specifies a value.

crusso commented 2 years ago

Joachim's suggestion above was adding an optional boolean expression between the body args and body, which saves you repeating the argument type as in my suggestion. Something like

public shared {caller} func (n : nat) {n > 0 and valid(caller)}
{ 
 ...body...
} 

that's concise, but looks a lot like a pre-condition that gets executed on every call, not just for ingress message. I'm inclined to isolate this hack elsewhere.

Alternatively, anticipating verification, maybe we could go for something like:

public shared {caller} func (n : nat)
  expect <filter>
  requires <pre-condition>
  ensures (r : typ) <post-condition>
  { 
  ...body...
  } 

where expect is used for inspection and the requires and ensures clauses are what we might use in future for pre- and (dependent) post-conditions.

nomeata commented 2 years ago

but looks a lot like a pre-condition that gets executed on every call, not just for ingress message.

We can of course run it for all calls, by running it again in the actual message.

chenyan-dfinity commented 2 years ago

func (n : nat) {n > 0 and valid(caller)}

This assumes we decode the message first. But inspect_message was run before deserialization to prevent the decoding cost.

crusso commented 2 years ago

According to the spec, inspect_message is allowed to access the message payload, but, of course, doesn't have to. I guess one could delay that by turning each (non-trivial) argument into a thunk, but is that worth it? Then you can filter by message name, and optionally by content.

actor {

   public func inc() : async () { ... };
   public func set(n : Nat) : async () { ... };
   public query read() : async () { ... };

   system func inspect(msg : {#inc; #set : () -> Nat; #read}) : Bool {
      switch (msg) { 
         case (#inc) : false;
         case (#read) : true;
         case (#set f) : f() > 0;
      }
   }
};
crusso commented 2 years ago

Or we give access to the raw blob and a variant of functions for decoding the blob on demand.

actor {

   public func inc() : async () { ... };
   public func set(n : Nat) : async () { ... };
   public query read() : async () { ... };

   system func inspect(dmsg : {#inc; #set : (data : Blob) -> Nat; #read}, data : Blob) : Bool {
      switch (msg) { 
         case (#inc) : false;
         case (#read) : true;
         case (#set f) : f(data) > 0;
      }
   }
};

How ugly do we want to get?

chenyan-dfinity commented 2 years ago

I think inspect doesn't need to decode the message. The pre-condition after decoding can be done in the main function body, which applies to both ingress and inter-canister messages. In the inspect_message, users get a Blob and it's up to them to decide how to inspect that blob.

nomeata commented 2 years ago

I think lazy decoding is what I'd expect from a high level language here. It's just an optimization after all.

crusso commented 2 years ago
  1. system func inspect(caller : Principal, msg : {#inc; #set ; #read}, blob) : Bool
  2. system func inspect(caller : Principal, msg : {#inc; #set : (data : Blob) -> Nat; #read}, data : Blob) : Bool
  3. system func inspect(caller : Principal, msg : {#inc; #set : () -> Nat; #read}) : Bool

I guess there's also a question about what to do about the hidden entry points (tmp_interface hack (true?), __async_helper (false?), and the stable_variable_foot_print (false?)).

This is getting gross...

nomeata commented 2 years ago

Do we have equality on shared functions refs? Then we could pass the method as a reference? Ah, but we can't really type that, nevermind.

Indeed gross.

I tend to think a central entry point is the wrong thing here, and some form of annotation is more suitable for a high level language.

crusso commented 2 years ago

Yeah, I was thinking the central entry point would be useful is you only want to restrict by caller, because then you could just type the msg bit as Any.

system func inspect(caller : Principal, _ : Any) : Bool { valid(caller) };

crusso commented 2 years ago

Also, since this is gross, isolating it in one gross method seemed attractive...

crusso commented 2 years ago

I may have a go this week and, unless I hear otherwise, will implement something along these lines:

system func inspect(caller : Principal, msg : {#inc: Blob -> (); #set : Blob -> Nat; #read: Blob -> ()}, arg: Blob) : Bool

Which give you the option to check the caller, the message name, the blob and the decoded message payload, decoded on demand. That seems reasonably flexible.

Not sure about whether to include the query variants (e.g. read) or not - IC inspect_message doesn't apply to queries yet, so the variant for read will never arise, but I imagine if the platform moves to certified queries in future, then it might.

As a stretch goal, I might be able to omit the fragile type annotation on the arguments since the type is in principle inferable from the enclosing actor type.

@nomeata still opposed in general?

Or we could aim much, much lower and just provide:

system func inspect(caller : Principal, msg : Text, arg: Blob) : Bool

and rely on other facilities (to_candid/from_candid) and the user to get things right when necessary.

nomeata commented 2 years ago

Opposed is too strong. The proposed interface feels a bit out of place for a more integrated language like Motoko, and would expect some higher level annotation-style interface. But I don't have concrete suggestions, so this shouldn't block anyone. But for that reason I'm mildly in favor of the more barebones variant and aim lower until we have a convincing story.

skilesare commented 2 years ago

system func inspect(caller : Principal, msg : {#inc: Blob -> (); #set : Blob -> Nat; #read: Blob -> ()}, arg: Blob) : Bool

I actually don't see the name in there. What parameter has it?

Not sure about whether to include the query variants (e.g. read) or not - IC inspect_message doesn't apply to queries yet, so the variant for read will never arise, but I imagine if the platform moves to certified queries in the future, then it might.

I'd plan for success. I can already see a need to reject a query method with an upgrade request(maybe this is misplaced due to http_request and http_request_update already existing elsewhere.

and rely on other facilities (to_candid/from_candid) and the user to get things right when necessary.

Also, I'm guessing the to_candid/from_candid along with this? That would be super. Would those functions work with cal_raw? Now that we have it we are a bit blocked for all we want to do with it by the lack of a cbor and candid decoder. I know off topic for this thread, but super excited for those features.

nomeata commented 2 years ago

I can already see a need to reject a query method with an upgrade request

Oh, good point, that should already be the case that the inspect hook is called for all update calls, even those that go to query methods, after all the purpose is to ask the canister if it wanta to pay for ingressing the call. So yes, do include queries.

crusso commented 2 years ago

system func inspect(caller : Principal, msg : {#inc: Blob -> (); #set : Blob -> Nat; #read: Blob -> ()}, arg: Blob) : Bool

I actually don't see the name in there. What parameter has it?

The name is encoded in the variant tag, which gives you access, should you want it, to the strongly typed decoder of the blob. Bit subtle but more strongly typed than just using a argument and user (mis-)supplied to/from_candid invocations.

Not sure about whether to include the query variants (e.g. read) or not - IC inspect_message doesn't apply to queries yet, so the variant for read will never arise, but I imagine if the platform moves to certified queries in the future, then it might.

I'd plan for success. I can already see a need to reject a query method with an upgrade request(maybe this is misplaced due to http_request and http_request_update already existing elsewhere.

and rely on other facilities (to_candid/from_candid) and the user to get things right when necessary.

Also, I'm guessing the to_candid/from_candid along with this? That would be super. Would those functions work with call_raw? Now that we have it we are a bit blocked for all we want to do with it by the lack of a cbor and candid decoder. I know off topic for this thread, but super excited for those features.

I still don't see the need for a cbor decoder, can you explain why that is needed for full call_raw? I was under the assumption that candid coding is enough.

jzxchiang1 commented 2 years ago

Can you explain what inc, set, and read mean? I don't see any mention of them in the interface spec.

nomeata commented 2 years ago

Wouldn't it be better to return { #set : () -> Nat , …}, i.e. a closure or thunk, so that one can't pass the wrong blob to it, while still delaying the decoding?

skilesare commented 2 years ago

I still don't see the need for a cbor decoder, can you explain why that is needed for full call_raw? I was under the assumption that candid coding is enough.

In a wallet that is created before a service exists I may want a multisig function where the users can view the payload they are approving. If I don't know the candid type I'm guessing I can't decode it? Maybe this is a bad assumption.

The idea here is that one user would supply that they want to call canister xyz with function future_service(#no_way_you_could_know_thist_variant_is_a("string": Text)). In call_raw I just get a blob....I actually don't know if this before or after is decoded via cbor, so maybe cbor isn't an issue? I'd want to deserialize this and send down to a client application the details of what was attempting to be called. (ie future_service( x = #no_way_you_could_know_thist_variant_is_a("string": Text)).

Maybe the better way to do this is for the wallet client to be able to take in candid and just get the blob and decode with the agent. But then again, if I have an automated third-party canister that may want to approve these things if they are on a whitelist, that canister would need to know the details and parse them. Thus I think that it makes sense for a motoko program to beable to reason about candid and (maybe) cbor.

crusso commented 2 years ago

Can you explain what inc, set, and read mean? I don't see any mention of them in the interface spec.

Those are just method names referring to an earlier, sketched example here: https://github.com/dfinity/motoko/issues/2528#issuecomment-1020005395

crusso commented 2 years ago

Wouldn't it be better to return { #set : () -> Nat , …}, i.e. a closure or thunk, so that one can't pass the wrong blob to it, while still delaying the decoding?

Yeah, I had it that way originally, but thought this way was more suggestive to the user (and avoids the need to allocate a closure - but that's a trivial concern. Happy to switch back to ().

crusso commented 2 years ago

I still don't see the need for a cbor decoder, can you explain why that is needed for full call_raw? I was under the assumption that candid coding is enough.

In a wallet that is created before a service exists I may want a multisig function where the users can view the payload they are approving. If I don't know the candid type I'm guessing I can't decode it? Maybe this is a bad assumption.

Candid is self-describing, so you can validate and decode without having an expected type to decode to. However, in Motoko, your decoder would probably need to decode to some untyped AST of values, I guess, to process the content. Doable in principle, even in user-space ( I think), but we don't have any dedicated support for that now.

The idea here is that one user would supply that they want to call canister xyz with function future_service(#no_way_you_could_know_thist_variant_is_a("string": Text)). In call_raw I just get a blob....I actually don't know if this before or after is decoded via cbor, so maybe cbor isn't an issue? I'd want to deserialize this and send down to a client application the details of what was attempting to be called. (ie future_service( x = #no_way_you_could_know_thist_variant_is_a("string": Text)).

Maybe the better way to do this is for the wallet client to be able to take in candid and just get the blob and decode with the agent. But then again, if I have an automated third-party canister that may want to approve these things if they are on a whitelist, that canister would need to know the details and parse them. Thus I think that it makes sense for a motoko program to beable to reason about candid and (maybe) cbor.

I'm almost certain the blobs seen by call_raw will be always be candid encoded, unless a user bypasses candid. I imagine the CBOR stuff only comes in when talking the http interface and never reaches canisters themselves. Perhaps @nomeata can correct me here if wrong. So there would be no need for Motoko to speak CBOR unless it's trying to do something really bizarre like issue an http request to a boundary node, pretending to be an external client.

nomeata commented 2 years ago

No need to correct, it's precisely as you say.

skilesare commented 2 years ago

So there would be no need for Motoko to speak CBOR unless it's trying to do something really bizarre like issue an http request to a boundary node, pretending to be an external client.

Wonderful news! I should really edit the bounty for this then. We don't need CBOR, but perhaps we need an AST parser for candid? Any idea what that would look like or can you point me in the right direction?

ByronBecker commented 2 years ago

@crusso I thought the call API for ExperimentalInternetComputer was pretty nice.

public let call : (canister : Principal, name : Text, data : Blob) -> async (reply : Blob) = Prim.call_raw;

Would it be possible to do something like this, but make the argument a record so that changes are more easily backwards compatible? I would like to reject requests on a more fine grained level based on the specific name of the publicly exposed function being called.

type InspectableFunctionCall = {
  caller : Principal;
  name : Text;   // name of the function (why is this a variant and not type Text in your examples?)
  arg : Blob;     // would we be able to reject here based on the size of the blob with an incoming request? For example maliciously sending a 1MB name to a  `getByName(name: Text)` or `setName(name: Text)` endpoint.
  requestLimit: ?Nat;     // throttle a principal if they reach a request limit (nice to have, not sure if this request limit per principal is directly exposable?)
};

system func inspect(call: InspectableFunctionCall): Bool {
  if (Principal.isAnonymous(call.caller)) { return false; };

  switch(inspectable.name) {
     // ...more fine grained permission/rejection for each endpoint
  }
}
crusso commented 2 years ago

@crusso I thought the call API for ExperimentalInternetComputer was pretty nice.

public let call : (canister : Principal, name : Text, data : Blob) -> async (reply : Blob) = Prim.call_raw;

Would it be possible to do something like this, but make the argument a record so that changes are more easily backwards compatible? I would like to reject requests on a more fine grained level based on the specific name of the publicly exposed function being called.

type InspectableFunctionCall = {
  caller : Principal;
  name : Text;   // name of the function (why is this a variant and not type Text in your examples?)
  arg : Blob;     // would we be able to reject here based on the size of the blob with an incoming request? For example maliciously sending a 1MB name to a  `getByName(name: Text)` or `setName(name: Text)` endpoint.
  requestLimit: ?Nat;     // throttle a principal if they reach a request limit (nice to have, not sure if this request limit per principal is directly exposable?)
};

system func inspect(call: InspectableFunctionCall): Bool {
  if (Principal.isAnonymous(call.caller)) { return false; };

  switch(inspectable.name) {
     // ...more fine grained permission/rejection for each endpoint
  }
}

Funnily enough, I didn't see you message, but have actually made the argument a record, precisely so we can add the textual method name and other fields in future. PR #3210

The message is encoded as a variant so you can access its payload in a strongly typed way. Since each method has its own argument type, each variant gets its own decoding for that argument (as a function, to avoid the cost of decoding when unnecessary).

You actually have access to all of the actor state in inspect_message, but you can not update it, so I don't know how you would manually count requests or otherwise detect and track malicious callers. I guess the actual message could bump the count and inspect message decline when it gets too high.

@ByronBecker feel free to comment on PR #3210 - especially the .adoc files that document it (unless you care about the code)