ceylon / ceylon.ast

Apache License 2.0
18 stars 3 forks source link

support typechecker generated identifiers that contain `$` #119

Closed jvasileff closed 8 years ago

jvasileff commented 8 years ago

The typechecker may create identifiers with $ while de-sugaring, but since $ is not a valid for Ceylon identifiers, ceylon.ast.redhat/ceylon.ast.core will throw. See https://github.com/ceylon/ceylon/commit/a65b81e363da72989a854abd8b9dbb2171fe0de4#commitcomment-18904016.

lucaswerkmeister commented 8 years ago

Okay, how about this: LIdentifier will accept identifiers that begin with $pattern$param$ (see https://github.com/ceylon/ceylon/commit/27d27b7d8de067eaa34bbe8b4e9b1eaa0d2a7026) only if the immediate caller (via exception stack trace) is the appropriate ceylon.ast.redhat function, called via the parameter convert function. In all other situations, such names continue to be rejected – if the typechecker uses them in more situations, update LIdentifier as needed, but only then.

jvasileff commented 8 years ago

via exception stack trace

way too slow. How about $s are allowed if the identifier starts with some_rare_unicode_char that will be added by ceylon.ast.redhat and stripped by ceylon.ast.core.

lucaswerkmeister commented 8 years ago

Perhaps. Something from one of the private use areas, then. But I’d still restrict this to the very specific pattern $pattern$param$*.

jvasileff commented 8 years ago

I’d still restrict this to the very specific

It's the same to me either way, but I think the potential for abuse is near zero, but there may be more $ scenarios in the future.

jvasileff commented 8 years ago

See https://github.com/ceylon/ceylon/commit/899a41e77adaa87554127c8469563816f95fd441 for updated prefix w/counter.

gavinking commented 8 years ago

Actually I reverted that commit. :)

lucaswerkmeister commented 8 years ago

Alright, so on Gitter we decided to add a second constructor to LIdentifier (and also UIdentifier, I guess), probably called internal, which also allows $ in identifiers. Identifier would also gain a Boolean flag to distinguish between internal and non-internal identifiers.

(Should it require at least one $ in the identifier? In an attempt to ensure that internal and non-internal identifiers can’t collide?)

jvasileff commented 8 years ago

(Should it require at least one $ in the identifier? In an attempt to ensure that internal and non-internal identifiers can’t collide?)

I would say no, since it seems like a restriction without a clear benefit or standard use. But also, while it seems to make sense to reify the constructor, I can't really think of a use for it.

jvasileff commented 8 years ago

Actually, maybe it should indicate whether the identifier is valid, independent of the constructor used. That seems more useful to me, to help you determine if an AST is valid.

jvasileff commented 8 years ago

@lucaswerkmeister the latest patch works great. Thanks!

lucaswerkmeister commented 8 years ago

I actually remembered that it’s incomplete… the flag is missing, and RedHatTransformer should probably not generate tokens for internal identifiers, right?

jvasileff commented 8 years ago

Should the flag just be a computed property that returns true if the identifier is valid?

Regarding tokens, I'm not convinced their presence or absence is or should be an indicator of anything. But since I haven't done any codegen or desugaring with ceylon.ast yet, I guess I don't have strong argument to make.

lucaswerkmeister commented 8 years ago

Well, the absence of the token is how identifierToCeylon detects that it should use the internal constructor. So if the conversion should be lossless both ways, then the flag needs to record which constructor was used, and RedHatTransformer needs to set text without assigning a token.

jvasileff commented 8 years ago

Why not always use internal? I'm just not convinced that exists token is exactly the same as "is synthetic".

lucaswerkmeister commented 8 years ago

I’d still like to keep these two things completely separate:

It just seems like a nice division to me, dunno…

jvasileff commented 8 years ago

I don’t see why the compiler would go through that trouble

I don't know if this makes sense yet, but to preserve some original program text in early de-sugaring stages of backend compilation with the goal of providing better error messages, IDE support, sourcemaps, etc.

keep these two things completely separate

But for that to be achieved, you are placing a burden on clients to always use the right constructor, even when imperfect information is available (i.e. typechecker de-sugaring if a token might at some point be present), or due to some abstraction such as the use of a utility function to create nodes in backend code.

must contain at least one dollar sign

I really don't like forcing you to have a dollar sign if you use internal(). Maybe I want to generate code that uses an identifier that may be from the original source file. I'd probably need to hack in an if (id.contains($)) then useInternalCtor some places.

To be clear, I don't think this (at least the first two items) would impact anything I'm currently doing, but I see the potential for not wanting this structure much greater than the potential for it being valuable to me in writing a backend.

lucaswerkmeister commented 8 years ago

Okay, I can see that you might want to create tokens for source maps and such.

I still don’t see how you would be in a situation where you don’t know which constructor to use. If you don’t know that, how do you make sure that your new identifier doesn’t collide with a user-specified name? The $ is your protection against that!

If internal doesn’t enforce the $ constraint, and the flag doesn’t indicate which constructor was used, then I don’t know if there’s even a point to using a separate constructor. Why not change the existing constructor to accept a $ as well (and document that), and add the valid flag to check if the constructor is valid Ceylon code or not? I could live with that, and it seems much simpler to me.

jvasileff commented 8 years ago

I still don’t see how you would be in a situation where you don’t know which constructor to use

Because I use utility functions for stuff, similar to your ceylon.ast.create, that could possibly instantiate identifiers from a String. But yes, if an identifier is created from scratch, it has to be done carefully.

if there’s even a point to using a separate constructor

I thought it was to keep the "regular user" api strict. But yeah, I don't really know how important that is.

lucaswerkmeister commented 8 years ago

Well, who is the regular user? It sounds like both ceylon.ast.redhat and your Dart backend will exclusively use internal… :)

jvasileff commented 8 years ago

I mean someone using ceylon.ast.core for code generation for their enterprise application or something. But yes, I'd just want to use internal.

lucaswerkmeister commented 8 years ago

I guess that still makes sense, okay. So two constructors, a valid flag, *identifierToCeylon always uses internal, create always uses the regular constructor.

lucaswerkmeister commented 8 years ago

Hm, I guess internal will have to get the prefixed parameter as well.

lucaswerkmeister commented 8 years ago

New version pushed.

lucaswerkmeister commented 8 years ago

@jvasileff any comments before the release tomorrow?

jvasileff commented 8 years ago

Oh cool, we get a same-day ceylon.ast release?!

I'll give the patch a try.

lucaswerkmeister commented 8 years ago

Well, I’ll try my best… :) though I suppose the formatter is more important, and the release takes a bit with the reproducible build.

jvasileff commented 8 years ago

@lucaswerkmeister the patch works for me, and looks great aside from the missing !. Thanks!

lucaswerkmeister commented 8 years ago

Alright, then with the latest version we’re done!