Open rossberg opened 4 years ago
to work around names being reserved (e.g., and, or)
Would we also allow keywords in things like Bool.and
and maybe variant { #and; #or }
? (This would probably require treating .foo
or #foo
as an parser atom instead, which I think was rejected before.
For char literals, we use double quotes and rely on the existing overloading mechanism.
We could go a step further and have Char <: Text
, with characters being single-character test values. Like Nat <: Int
. One could use #
with Char
and Text
alike, and mix them and one would need less annotation. But maybe more confusing than helpful, and probably not nice since we have chosen to constrain out subtyping by candid’s subtyping relation…
Would we also allow keywords in things like Bool.and and maybe variant { #and; #or }? (This would probably require treating .foo or #foo as an parser atom instead, which I think was rejected before.
Well, I dismissed it exactly because it should use the same lexical syntax token as identifiers. But here we're talking about changing that very fact.
That said, I don't think it requires making them a single token. We can also do it with an "extended id" production in the parser that includes keywords.
Char <: Text
That would force Char into a non-scalar representation. Doesn't that seem rather undesirable?
That would force Char into a non-scalar representation. Doesn't that seem rather undesirable?
I wouldn't overly worried. If the programmer experience is better this way (which I am not sure, just brainstorming here) I think a slight performance his is justifiable. In general we already have to bit-tag Char
values, so putting them into the heap is not much more. And if it does then we could start to put small text strings (<= 3 bytes) into such tagged scalars, which may improve Text
code as well, e.g. ocurrences of "\n"
or ""
would no longer need to be heap allocated.
But either ways these are optimizations that seem much less relevant than the developer ergonomics of whether they have to type-annotate "\n"
when they want this to be a Char
.
I just ran into this while trying to run the latest Candid test suite in Motoko. There we have values like (func "aaaaa-aa"."🐂")
, but there is no way to express such types or values in Motoko.
Our IDL-Motoko design doc (https://github.com/dfinity/motoko/blob/master/design/IDL-Motoko.md) as well as the implementation in mo_idl/idl_to_mo.ml
currently says we escape/unescape method names like record fields (including falling back to the hash) but that’s pretty bogus, as it's not reversible.
You mean that's bogus because method names are not actually hashes, unlike record labels? Yeah, using that mapping seems broken then.
Agreed, Motoko can't express these. But whatever we do for Motoko, so can't many other languages. So, I'm not sure this can be solved in general. If somebody defines an interface with exotic method names, they're asking for trouble.
One could argue that we should not allow anything exotic in the first place, but that would feel over-restrictive -- there might be niches where it is useful. So perhaps this rather is something to put into an interface style guide?
Maybe… in that case I’ll beef up the Candid test suite runner to skip tests not expressible in Motoko
hmm, I was under the impression that method names are sorted by the hash as well. We don't need reversibility. If Candid has a method 🐂, Motoko can just call the hash value of 🐂?
hmm, I was under the impression that method names are sorted by the hash as well.
No, method names are stored as strings:
T : <methtype> -> i8*
T(<name>:<datatype>) = leb128(|utf8(<name>)|) i8*(utf8(<name>)) I(<datatype>)
We don't need reversibility. If Candid has a method ox, Motoko can just call the hash value of ox?
I think we do need it for the type import/export, see first bullet point of https://github.com/dfinity/motoko/blob/master/design/IDL-Motoko.md#notes
Just ran into this again; I used the empty string as a method name in Candid (to have shorter hand-written Candid data) in the test suite, but that didn’t work.
It’s only a matter of time until someone offers a service on the IC with a method name class
or stable
etc.…
To solve this, we could do method name escaping somehow at the interface to Candid, not changing Motoko. Or we could add arbitrary identifier escaping to Motoko.
BTW, does the candid spec specify the ordering use to sort method names. Is there actually a standard ordering on utf8 we can appeal to?
I think adding method name escaping to Motoko makes a lot of sense and don't see much difficulty doing it. C# has a similar feature and I'm sure many other industrial languages do too.
... and then we could be tempted to use zero width spaces to encode type identifier stamping (tempting but I'm not actually serious.)
BTW, does the candid spec specify the ordering use to sort method names. Is there actually a standard ordering on utf8 we can appeal to?
I assume it's lexicographic ordering of the utf8 encoding, but being explicit is helpful of course.
On a number of occasions, it is useful to have identifiers that do not follow the usual lexical conventions or collide with keywords:
and
,or
),Here is one possible suggestion:
I believe that char literals are rare enough that this shouldn't be a significant problem, and it frees up the use of a precious ASCII character. Thoughts?