Closed CeylonMigrationBot closed 9 years ago
[@FroMage] For the record, I got bitten by the toplevel method ceylon.io.test
and the Ceylon package ceylon.io.test
:(
[@tombentley] I think my patch should address this. It will append a _
to all classes with lower case names. Since it doesn't affect package names then as far as I can see it should just work: Java won't see any ambiguity.
Of course, my patch creates a new ambiguity: A pair of classes with an initial lower case letter and the same name except one has a trailing _
. Obviously the one without a _
would have to be written in Java. I can't see that being a problem in real world code bases though.
[@tombentley] Is this still a problem?
[@FroMage] Not sure, we have to add a test.
[@FroMage] For some reason I still can't do this from the latest IDE, even though the class generated is indeed test_
… BTW, is test_
a valid package identifier?
[@tombentley] :-( of course test_
is a valid package identifier. So there is still a collision.
BTW, since you can write
shared class \Ic() {
}
it's not as simple as munging object
names. That's why my changes work at the 'Java level', rather than the 'Ceylon level'.
[@FroMage] Mmm, and I guess we can also write classes named Test_
where the collision exists with objects called test
on case-insensitive FS.
Remind me why we didn't go with a non-identifier character? ;)
[@tombentley] > Remind me why we didn't go with a non-identifier character? ;)
Hmm, my crystal ball must be faulty.
In Java $
is allowed in a package name too, so we'd still have a theoretical problem even if we were 'escaping' lower case .class
names with that.
The problem is that we'd like some convention which:
I think we can only really do any two of these. I don't think we can really munge package names as easily as we do class names because of interop (Java's lovely package
declaration and directory naming rules). That means we'd end up quoting upper case .class
names as well as lower case ones. This runs afoul of the interop requirement. So I think I'm prepared to draw the line and say we don't support some of this weird stuff.
Unless someone has any better ideas? I mean changing the class name quoting to something else shouldn't be too hard now, since it's mostly encapsulated.
[@gavinking] Do you want me to disallow _ in package names?
Sent from my iPhone
On Aug 23, 2012, at 2:07 PM, Tom Bentley notifications@github.com wrote:
Remind me why we didn't go with a non-identifier character? ;)
Hmm, my crystal ball must be faulty.
In Java $ is allowed in a package name too, so we'd still have a theoretical problem even if we were 'escaping' lower case .class names with that.
The problem is that we'd like some convention which:
Doesn't make interop a PITA because we're using some funky unicode symbol, but Works on case insensitive FS and Can distinguish a package name from a type name. I think we can only really do any two of these. I don't think we can really munge package names as easily as we do class names because of interop (Java's lovely package declaration and directory naming rules). That means we'd end up quoting upper case .class names as well as lower case ones. This runs afoul of the interop requirement. So I think I'm prepared to draw the line and say we don't support some of this weird stuff.
Unless someone has any better ideas? I mean changing the class name quoting to something else shouldn't be too hard now, since it's mostly encapsulated.
— Reply to this email directly or view it on GitHub.
[@FroMage] > Do you want me to disallow _ in package names?
If that's the only thing required to fix every ambiguity, then sure why not?
It might be useful though when your package name is a two-word part: com.acme.fastAllocator
or com.acme.fast_allocator
.
[@tombentley] The http://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html for java is to use underscore to replace characters not otherwise allowed in a package name, for example org.ceylon_lang
. Similarly that pesky .int
top level domain is recommended to be int_
in a package name.
I perceive the .int
thing to be less of an issue, because there are only a small number of .int
domain names. I'm a lot less confident about the former use though -- It would be a problem for interop but we'd also need an alternative recommendation for ceylon packages.
[@gavinking] Well, I suppose we need to decide what our standard is. According to the spec today, a package name is comprised of all-lowercase characters and underscores. But according to the actual grammar, they are composed of LIdentifier
s (that seems to make more sense). So there's three possible conventions:
_
I must admit that I have no idea what is most common in Java. For Ceylon, option 1 seems to make most sense to me,
[@FroMage] Yeah, I'd go with camel-case by convention too. Though it seems a bit harsh to disallow underscored in there. For example a point where I find camel-case is ambiguous is acronyms. For example I prefer HTTP
over Http
, but I have to admit that URLHTTPConnection
is just bad. So I suppose the convention should be camel-case, even for acronyms. With UrlHttpConnection
, so that we never see things like URL_HTTP_Connection
.
[@gavinking] @FroMage I definitely have a strong preference for Http
, Url
, etc. It's the only thing that really works well for attributes (for example urlHttpConnection
).
[@FroMage] Yes agreed. I just have to get used to it. And we should put that convention in the spec, as even http://en.wikipedia.org/wiki/CamelCase#Programming_and_coding and Java's own conventions http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-135099.html#367 are not clear on the topic.
[@gavinking] I don't believe we actually have any specification of our naming conventions anywhere. We should document them on the website at least. Or we could put them in a new chapter of the spec.
[@tombentley] @gavinking how would you feel about disallowing _
only at the end of package names? I realize that's a pretty arbitrary seeming restriction, but it does get us out of this particular hole, while not preventing the legitimate use of an underscore in the middle of a package name. The one thing I don't like about that is it would mean breaking with the aforementioned Java advice for dealing with package names which would otherwise collide with Java keywords.
A simple alternative would be to allow _
in package names except as the first character, and change the quoting for lowercase class names so that the _
was at the front. To be honest, I think I prefer that except for the extra work it entails. Then, I would be tempted to change the quoting for java keyworded package names to append a _
instead of a $
, so we were following the Java advice.
[@gavinking] Isn't there something else we could do, like: packages with _
at the end get mapped to _$
or something lame like that?
[@tombentley] We could do that, I guess. Though I wonder how this will end up if there are any more collisions like this.
[@tombentley] In #602 we chose _
over $
because it was easier on the eyes when writing in Java. That's great and I do prefer to see _
over $
in, for example ceylon.language. However, here and in #420 we end up having to use $
anyway in stuff which would also be visible when writing Java because _
is allowed in Ceylon identifiers. I admit in this issue the suggestion of _$
is very unlikely to be seen in normal code, but using $
in #420 will be very obvious.
If we'd chosen $
in #602 then this issue wouldn't really exist, would it? I suppose we'd have to quote Java keyworded package names using something other than $
, but _
is the natural choice for that imo, since that fits with the Java package naming advice.
I'll go with _$
if everyone else wants that, but this looks to be like the last chance we have to come up with some rules that don't just appear to be completely ad-hoc.
[@quintesse] Personally I've always been particular to using $
, that way we know for sure we won't have problems with name clashes. That it looks a little nasty when using Ceylon code from Java I don't really mind, especially because to me _
won't win any beauty contests either.
[@FroMage] I really don't care, either way.
[@FroMage] Guys, I vote we do the change from _
to $
to keep allowing underscores in package names, simply on the basis that it's entirely possible that some Java packages might be using underscores and that saves us another escape sequence for interop.
[@tombentley] > Guys, I vote we do the change from _ to $ to keep allowing underscores in package names
You mean in class names, right? I'm inclined to agree.
[@FroMage] No I really mean to keep allowing _
in package names. Oh you mean we have another issue with methods named foo_
where we should store it in a foo__
class? I don't think that's ambiguous, is it? I mean I know we have some code in the model loader that tries to guess underscore things but I'm pretty sure we should remove that. You told me already why we first try with underscore and then without, but I can't recall?
Oh, anyways this ambiguity will disappear when we change the escape postfix to $
since that is not allowed in names :)
[@tombentley] @FroMage I don't know what you mean when you said "I vote we do the change from _
to $
"
[@FroMage] I mean just that, why does that sound ambiguous? Changing the toplevel lower-case encoding from foo_
to foo$
fixes the issue with package names that have underscores.
[@tombentley]
why does that sound ambiguous?
Stef: I vote we do the change from _
to $
Tom (trying to clarify what 'the change' is supposed to mean): You mean in class names, right?
Stef (misunderstanding the question Tom was asking): No...
Well, I thought it was ambiguous anyway...
[@FroMage] Sorry I lost you. Replacing the '' postfix for lower-case toplevels with '$' removes the ambiguity with packages that end with '' BUT I think that implementing the change that member types use a separator of '::' removes the need for this change, right?
Because now 'foo::Bar' will be different from 'foo.Bar' where 'foo_' is a package. On the other hand because we're not allowed to have type names with the same name as packages in Java, we still have to make the toplevel be encoded to foo$
.
[@tombentley] Yes, although ::
makes things completely unambiguous on the ceylon side, for interop we'll still need things to be unambiguous at the Java level, and quoting the class name with a trailing $
would achieve that. Since package names (when a Java keyword) are quoted with a leading $
I can't see a possibility for ambiguity.
[@gavinking] Please don't do anything that makes calling print() from Java look disgusting.
Sent from my iPhone
On Oct 15, 2012, at 1:36 PM, Tom Bentley notifications@github.com wrote:
Yes, although :: makes things completely unambiguous on the ceylon side, for interop we'll still need things to be unambiguous at the Java level, and quoting the class name with a trailing $ would achieve that. Since package names (when a Java keyword) are quoted with a leading $ I can't see a possibility for ambiguity.
— Reply to this email directly or view it on GitHub.
[@FroMage] Gavin, tell me who in their right mind would use Ceylon's print method from Java?
It's not like we have a number of escape chars we can use. The total sum of what is allowed in Java but not Ceylon is $
.
We could make the toplevel class name escape be foo$toplevel.class
if you find that better.
[@tombentley]
The total sum of what is allowed in Java but not Ceylon is
$
.
Actually that's not true: We can use any currency symbol we like. Plus I think there are other unicode general categories we could use too. But those possibilities are all worse than using $
.
I think Gavin's problem is that in our desire to permanently fix a pretty unlikely corner case we're making the usual interop case 'disgusting'. Sure no one is going to call print()
in Java, but this doesn't apply only to print()
, but all top levels, any of which someone might be interested in calling.
[@quintesse] So help me remember, why do we do this again? In which cases can there be a name clash?
[@FroMage]
So help me remember, why do we do this again? In which cases can there be a name clash?
In Java, classes can't have the same name as packages. So if you have a package named foo_
and a toplevel named foo
(for which we generate a class named foo_
) then it breaks.
[@FroMage]
Actually that's not true: We can use any currency symbol we like. Plus I think there are other unicode general categories we could use too.
But they're not ASCII so we can't type them on an ASCII keyboard without trickery.
[@gavinking]
Gavin, tell me who in their right mind would use Ceylon's print method from Java?
That's not the point. It's going to be very common for there to be a number of toplevel functions as part of the main API for a Ceylon module. If calling those is something from Java/other languages is super-ugly, I think that's going to be a pain-point.
[@tombentley] What are we going to do for M4? To me, this doesn't look to be a killer bug that really has to be fixed, so perhaps we should leave it for M5?
[@FroMage] Yeah, the pb only occurs for package names that end with underscores, you're right.
[@FroMage] Moving to M6
[@FroMage] Closing, since this depends entirely on spec's #3911
[@gavinking] I don't agree that this should be closed. #3911 is going to be just a temporary restriction, only for 1.0.
[@FroMage] It is closed until we open a new issue to bring postfix _
back.
[@FroMage] Note that while postfix _
for packages may be necessary for interop, the same isn't true for toplevels, and I can't think of any good reason why we'd want to allow that and not Unicode symbols. It doesn't do anything for readability and as such it's hard to justify.
[@gavinking]
I can't think of any good reason why we'd want to allow that and not Unicode symbols.
We do allow unicode symbols.
It doesn't do anything for readability and as such it's hard to justify.
Since we took $
away, a trailing _
is about the only thing that for example a code generation tool could use.
[@gavinking] FTR, spec §2.3, and Ceylon.g:3906
. You can try it out, unicode identifiers work fine in Ceylon.
[@FroMage] In Java it's illegal for a package to have the same name as a class.
For example, if you have a class named
com.foo
(package com; class foo{}
) you can't have a package namedcom.foo
.We should turn that error off in the Java backend, since that is allowed in Ceylon (that's unambiguous) and we already allowed it for classes named
ceylon
that collide with theceylon
package in another issue.The theory is that it should Just Work [tm] since the model loader already handles those cases.
That being said, it's entirely possible that this Ceylon code will break once translated in Java:
Because that will produce a class file named
ceylon$language$String.class
that you can only import in Java asceylon.language.String
which is indeed ambiguous.So. We probably have to munge
object
class names to avoid the ambiguity with packages. Same for toplevel methods I suppose. Perhaps that is already planned by @tombentley's outstanding huge patch?I think that for normal type names (classes, interfaces) there cannot be any ambiguity since package names are lower-cased.
@gavinking: sounds right?
[Migrated from ceylon/ceylon-compiler#721] [Closed at 2013-10-29 16:45:17]