dfinity / candid

Candid Library for the Internet Computer
Apache License 2.0
277 stars 82 forks source link

(a) `Nat` <: `Int`, (b) `(Nat|Int)N <: (Nat|Int)M` where N < M, and (c) text format should use smallest necessary type #203

Open hansl opened 3 years ago

hansl commented 3 years ago

Right now, if you pass in 0 as a number as is, the textual parser turns it into an Int.

A better experience would be to have the textual parser uses the smallest type that can represent the number (e.g. 0-255 uses Nat8, -257 - -65536 use Int16, etc.

Then, have the candid values officially support Nat8 <: Nat16 <: Nat32 <: Nat64 <: Nat and Int8 <: Int16 <: Int32 <: Int64 <: Int then NatN <: IntN and Nat <: Int, thus removing any forced typings when entering numbers through the textual parser.

chenyan-dfinity commented 3 years ago

This is only a problem when the did file is not around. With the system level support for fetching did file, textual values will be annotated with the method signature automatically.

Regarding subtyping, we use inclusive subtyping, i.e. A <: B when the representation of any value of type A also represents the same value at type B. So only Nat <: Int holds.

hansl commented 3 years ago

Actually, Nat8 <: Int16 would also be okay.

hansl commented 3 years ago

Actually, Nat8 <: Int16 would also be okay.

hansl commented 3 years ago

With the system level support for fetching did file, textual values will be annotated with the method signature automatically.

This doesn't work in many cases, in particular when you don't have access to the network but still want to sign messages locally (which is a very valid use case for NNS).

I don't want a system-dependent workaround for this issue. This is a real problem and dismissing it just causes MORE user pain. There's currently pain from developers both internally and externally (when, e.g. using the wallet).

chenyan-dfinity commented 3 years ago

I see the pain, but I don't think subtyping is the solution. Without did files, you cannot accurately know the number type, e.g. for wallet canister, one of the field is nat64, if we follow the smallest type rule, it's still wrong. The real solution is to get the did file somewhere, either from the canister, or package with dfx. We already package the wallet canister with dfx, why not adding the did file as well? The same argument goes to NNS canisters.

hansl commented 3 years ago

why not adding the did file as well

Because we have some control over our own canisters, we don't have control over users canisters. Pushing the problem to infrastructure is bad UX practice, because infrastructures are finicky. And it gives a message to the user that they're the problem, which is clearly not the case here (WE are the problem).

We all know that 12345 can be represented by all of Int, Nat, Int16, Nat16, Int32, Nat32, Int64, Nat64 then why does it has to be a pain for the user? And why does the error message sucks so much?

Put yourself in the user shoes. You get a Expected Nat, found Int when you passed 10. What does that mean? How do you fix it? Why is Nat not an Int? Why is the system too dumb to realize that? My 1 year old figured it already. How hard is it to add an if X can be a Nat, then Nat it?

chenyan-dfinity commented 3 years ago

I see several problems here: 1) Textual representation is not optimized for writing. I don't think inputing number is the only pain here. Nested records, variants, recursive values are all verbose to type. I think this is the real problem we try to solve. 2) did file is an inherent part of Candid, dfx or the network needs to provide this. Without knowing the type, how can a user send a record? You don't even know the field name. Given the did file, we can greatly improve UX by providing auto-completion while typing the textual value, see my ic-repl experiment: https://dfinity.slack.com/archives/CGA566TPV/p1614233910046000 3) Bad error message for decoding. Sure, we definitely need to improve on this. 4) Parsing numbers. The subtyping rules you proposed basically means we only need one number type in Candid, which is true for most cases. And it's actually recommended to use Nat or Int whenever possible as method signatures. We do support Nat <: Int. So it is seamless when the canister uses Nat and Int. The trouble mostly comes from Rust canisters, where they don't have native Int/Nat types and people are not using the Candid::Int or i128 as method types. This is against the best practice, and we recommend to use IntX/NatX only when you are very conscious about space.

hansl commented 3 years ago

Textual representation is not optimized for writing.

There's no better way right now. So let's add shorthands for record, tuples, vectors and optional and make it a good one.

hansl commented 3 years ago

did file is an inherent part of Candid

That's misleading. Candid itself does not require schemas, as you also encode the type in the serialization. If Candid wasn't built for this, it will definitely be used like this.

see my ic-repl experiment

AFAICT, that REPL does not improve in any way the UX of entering candid text format.

Parsing numbers

I think you're, again, pushing the problem to developers but it's really the Candid text format and the Candid spec that's at fault. The parser is overly strict and non-intuitive. There's no reason 0 cannot be ANY numbers.


I think I've made my case and I have other shit more important than this. This is a plus for the users that can be implemented using toolings available today, and make the experience a lot better. But you think there's a better way, so good luck. It's your project. I'll just refer to this issue when developers ask me why they keep seeing "Got Int, expected Nat" when they pass in cycles.

rossberg commented 3 years ago

@hansl, I think what you have in mind would essentially turn Candid into an untyped language, with all sorts of implicit conversions being made on the receiving end. That would have all the problems of untyped languages with implicit conversions, which are even worse when applied across app boundaries.

In particular, Candid's notion of upgrade compatibility is inherently tied to types. We would not be able to check or guarantee anything otherwise. If we allowed clients to get loose on typing, all bets would probably be off with not breaking anybody when you evolve an interface.

That said, we could add certain "safe" subtype relations. But as @chenyan-dfinity points out, that would not help in other cases. Overall, I'd argue that it would rather be counterproductive, by furthering potential misconceptions in that regard.

Perhaps the real problem is the text format? It is not self-describing by default in terms of types, especially when it comes to numbers. If we e.g. changed the syntax of text literals to require some form of type suffix, then there might be fewer surprises and pitfalls. But I'm not sure that works for all types and values. Worth thinking about?

hansl commented 3 years ago

@rossberg

would essentially turn Candid into an untyped language

I don't think so. If anything, I'm advocating for a properly typed integer, where {0} <: {0-127}. If anything, it follows Peano's axiom; there's a superset/class of ALL integers instances and a class of integers instances that can be converted to its superclass. It's not untyped, it's just polymorphism. There are also the example of tuples not being their own types that's kind of weird in a different way.

In particular, Candid's notion of upgrade compatibility is inherently tied to types.

Are you saying that if I pick a number to be nat16, I will never be able to upgrade it to nat32 without breaking backward compatibility? To me, that's the part that doesn't make sense.

That said, we could add certain "safe" subtype relations.

Yes. To me 0 being both a Nat8 and a Nat is not coercion; it's subtyping.

Perhaps the real problem is the text format?

Definitely lots of improvements could be made there. But I think this particular issue covers more a design space than an implementation problem.

require some form of type suffix

Adding burden to the end user will not help them, us, or anyone.

chenyan-dfinity commented 3 years ago

Are you saying that if I pick a number to be nat16, I will never be able to upgrade it to nat32 without breaking backward compatibility? To me, that's the part that doesn't make sense.

blob is vec nat8, do you expect it to be upgrade to vec nat64 at some point?

Perhaps the real problem is the text format? Definitely lots of improvements could be made there. But I think this particular issue covers more a design space than an implementation problem.

I'm open to make the parser more liberal. But coercive subtyping breaks type safety when programmatically calling canisters: you don't expect an i8 to implicitly become an i64 in Rust.

I don't see why having the did file around is not an option to you. Even if we make your proposed change, without the did file, the returned record field will be hashes, which is really bad for UX.

hansl commented 3 years ago

blob is vec nat8, do you expect it to be upgrade to vec nat64 at some point?

blob being vec nat8 is an implementation detail; from a typing standpoint it should be seen as a series of bytes or a chunk of memory. It is a different type than vec nat8.

If I have an actual vec nat8 (for say, a list of spaces in a parking lot). It is possible that I want to upgrade to vec nat16 (added a new floor). This use case is currently impossible.

But coercive subtyping breaks type safety when programmatically calling canisters: you don't expect an i8 to implicitly become an i64 in Rust.

Rust is stricter than Candid already on subtyping though (e.g. on records or array size). Up to the point of being painful. But there are proper Into trait implementations that make the code let x: i64 = 10i8.into(); possible.

I don't see why having the did file around is not an option to you.

Airtight machine, Ledger and other crazy canisters who build their candid interface dynamically from a database, canisters that are during an upgrade might have a different candid interface than any DID we would download... Canisters that don't implement the tmp_hack hack.

I love to have the DID files, but I don't think it's realistic to just say we REQUIRE it.

chenyan-dfinity commented 3 years ago

If I have an actual vec nat8 (for say, a list of spaces in a parking lot). It is possible that I want to upgrade to vec nat16 (added a new floor).

Great example! In this case, the type should be vec nat. Using nat8 means you are sure the number won't go beyond the range, for example, in case of blob or age. When I see vec nat8, how do I know it's used for a parking lot or a series of bytes?

10i8

If you are fine with i8 suffix, sure. I'm onboard with anything on purely the parsing side.

Airtight machine, Ledger and other crazy canisters who build their candid interface dynamically from a database,

This seems orthogonal which doesn't prevent the developer from generating a did file dynamically.

canisters that are during an upgrade might have a different candid interface than any DID we would download...

dfx install should definitely check for subtyping during an upgrade. So I would say did file is required for upgrade.

Canisters that don't implement the tmp_hack hack.

There's going to be system-level support, see https://github.com/dfinity-lab/ic-ref/issues/279

I love to have the DID files, but I don't think it's realistic to just say we REQUIRE it.

Yes, we make a best effort when the did file is not around. The UX is never going to be perfect in this case, e.g. the hash in record field.

hansl commented 3 years ago

In this case, the type should be vec nat.

Theory != reality. Not every developers are omniscient on the API and best practices.

suffix

Nah this was Rust code.

So I would say did file is required for upgrade

For the person doing the upgrade. The person using the canister might be in a state where they download a Candid file but that file is now invalid.

system-level support

Only for opt-in canisters.

The UX is never going to be perfect in this case

And this issue is about making it better for integers and naturals.