Open nomeata opened 5 years ago
Agreed. I have a slight preference for the first option, but the second seems simpler, so I'd be fine with that as well, provided we have a hash function that does not get in the way.
Do "foo" and "foo_" actually collide or was that just meta? If they do then we should pick a better hash function. There is no good reason why there should be any collision for names with length <=4.
Do "foo" and "foo_" actually collide or was that just meta? If they do then we should pick a better hash function. There is no good reason why there should be any collision for names with length <=4.
They collide because of the field escaping; we want to treat async_
as async
, because we can’t write async
as a field name (see https://github.com/dfinity-lab/actorscript/issues/695 for a summary of the unescape function).
If we choose the IDL’s has function for AS as well (see #733) (or maybe the last 31 bits of it) then we have less conflicts. It’s a bit dirty again to co-design the IDL with AS, but maybe we should just be pragmatic.
Maybe the best solution would be to have an escaping mechanism for identifiers in AS itself, so that any string can be used as an identifier. Maybe _'...'
or \'...'
?
That would work too, no strong opinions here. But we'd need two, right, one for arbitrary stings, and another one to encode a specific IDL hash value (_1_
under the current plan)
We still need a plan here. But I guess a first good step is to make the typechecker complain if there are collisions, and think about a possible escaping mechanims later.
SGTM. Your point that we would need two forms makes me less enthusiastic about an escape mechanism.
One alternative, viable now that we have the type description header, is of course drop field hashing in the IDL, and just put the strings in the type description (with an indication into a list of strings). This would also make pretty-printing or dynamic use of the IDL format much nicer.
The downside is of course a bigger message (but still better than JSON) and that dynamic use of the IDL format is much nicer.
Interesting idea. The other disadvantage I can see is yet more complexity, at least if you want to avoid duplication.
What about numeric labels? I suppose you'd need to encode them as text, inverting the current scheme (into something more like JavaScript).
Interesting idea. The other disadvantage I can see is yet more complexity, at least if you want to avoid duplication.
Yes, that’s what I meant with an ~indication~ indirection into a list of strings.
It doesn't sound so bad… unti you wonder if you want it to be sorted :-)
Requiring it to be sorted makes serialization harder; essentially requires an additional pass over the data (or at least the types). Requiring it not to be sorted might be fine, if we require all lists of field names to be sorted by field name (and not just field name index). So that would work.
What about numeric labels?
What would that gain us?
What would that gain us?
I meant, how to encode tuples, which some langs have? Currently, they naturally are just records with indices 0,1,... With textual labels, that becomes more cumbersome, too, and you'll need to add all the text encodings of numbers to the string list.
Right. We could
"1"
etc…), as you sayNeither is particularly pretty. But saying “we hash record field names because of tuples” is also a bit odd. Weird.
I think our type checker needs to reject code that tries to use records (or variants) as sharable types when the field’s IDL hashes collide, e.g. in
{ foo_ : (); foo : ()}
, or in cases where the hashes collide.We could either
The latter ensures that programmers can be sure that all record types with shareable fields are shareable? Otherwise there is a risk that the programmer later wants to send a type over the wire and finds that he has to refactor the code, renaming some fields, in order to do so.
Assigning to Andreas for the policy decision.