eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
399 stars 62 forks source link

Package that ends with underscore can clash with toplevel class name #721

Closed CeylonMigrationBot closed 9 years ago

CeylonMigrationBot commented 12 years ago

[@FroMage] In Java it's illegal for a package to have the same name as a class.

For example, if you have a class named com.foo (package com; class foo{}) you can't have a package named com.foo.

We should turn that error off in the Java backend, since that is allowed in Ceylon (that's unambiguous) and we already allowed it for classes named ceylon that collide with the ceylon package in another issue.

The theory is that it should Just Work [tm] since the model loader already handles those cases.

That being said, it's entirely possible that this Ceylon code will break once translated in Java:

shared object ceylon {
 shared object language {
  shared class String(){}
 }
}

Because that will produce a class file named ceylon$language$String.class that you can only import in Java as ceylon.language.String which is indeed ambiguous.

So. We probably have to munge object class names to avoid the ambiguity with packages. Same for toplevel methods I suppose. Perhaps that is already planned by @tombentley's outstanding huge patch?

I think that for normal type names (classes, interfaces) there cannot be any ambiguity since package names are lower-cased.

@gavinking: sounds right?

[Migrated from ceylon/ceylon-compiler#721] [Closed at 2013-10-29 16:45:17]

CeylonMigrationBot commented 12 years ago

[@FroMage] For the record, I got bitten by the toplevel method ceylon.io.test and the Ceylon package ceylon.io.test :(

CeylonMigrationBot commented 12 years ago

[@tombentley] I think my patch should address this. It will append a _ to all classes with lower case names. Since it doesn't affect package names then as far as I can see it should just work: Java won't see any ambiguity.

Of course, my patch creates a new ambiguity: A pair of classes with an initial lower case letter and the same name except one has a trailing _. Obviously the one without a _ would have to be written in Java. I can't see that being a problem in real world code bases though.

CeylonMigrationBot commented 12 years ago

[@tombentley] Is this still a problem?

CeylonMigrationBot commented 12 years ago

[@FroMage] Not sure, we have to add a test.

CeylonMigrationBot commented 12 years ago

[@FroMage] For some reason I still can't do this from the latest IDE, even though the class generated is indeed test_… BTW, is test_ a valid package identifier?

CeylonMigrationBot commented 12 years ago

[@tombentley] :-( of course test_ is a valid package identifier. So there is still a collision.

BTW, since you can write

shared class \Ic() {
}

it's not as simple as munging object names. That's why my changes work at the 'Java level', rather than the 'Ceylon level'.

CeylonMigrationBot commented 12 years ago

[@FroMage] Mmm, and I guess we can also write classes named Test_ where the collision exists with objects called test on case-insensitive FS.

Remind me why we didn't go with a non-identifier character? ;)

CeylonMigrationBot commented 12 years ago

[@tombentley] > Remind me why we didn't go with a non-identifier character? ;)

Hmm, my crystal ball must be faulty.

In Java $ is allowed in a package name too, so we'd still have a theoretical problem even if we were 'escaping' lower case .class names with that.

The problem is that we'd like some convention which:

I think we can only really do any two of these. I don't think we can really munge package names as easily as we do class names because of interop (Java's lovely package declaration and directory naming rules). That means we'd end up quoting upper case .class names as well as lower case ones. This runs afoul of the interop requirement. So I think I'm prepared to draw the line and say we don't support some of this weird stuff.

Unless someone has any better ideas? I mean changing the class name quoting to something else shouldn't be too hard now, since it's mostly encapsulated.

CeylonMigrationBot commented 12 years ago

[@gavinking] Do you want me to disallow _ in package names?

Sent from my iPhone

On Aug 23, 2012, at 2:07 PM, Tom Bentley notifications@github.com wrote:

Remind me why we didn't go with a non-identifier character? ;)

Hmm, my crystal ball must be faulty.

In Java $ is allowed in a package name too, so we'd still have a theoretical problem even if we were 'escaping' lower case .class names with that.

The problem is that we'd like some convention which:

Doesn't make interop a PITA because we're using some funky unicode symbol, but Works on case insensitive FS and Can distinguish a package name from a type name. I think we can only really do any two of these. I don't think we can really munge package names as easily as we do class names because of interop (Java's lovely package declaration and directory naming rules). That means we'd end up quoting upper case .class names as well as lower case ones. This runs afoul of the interop requirement. So I think I'm prepared to draw the line and say we don't support some of this weird stuff.

Unless someone has any better ideas? I mean changing the class name quoting to something else shouldn't be too hard now, since it's mostly encapsulated.

— Reply to this email directly or view it on GitHub.

CeylonMigrationBot commented 12 years ago

[@FroMage] > Do you want me to disallow _ in package names?

If that's the only thing required to fix every ambiguity, then sure why not?

It might be useful though when your package name is a two-word part: com.acme.fastAllocator or com.acme.fast_allocator.

CeylonMigrationBot commented 12 years ago

[@tombentley] The http://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html for java is to use underscore to replace characters not otherwise allowed in a package name, for example org.ceylon_lang. Similarly that pesky .int top level domain is recommended to be int_ in a package name.

I perceive the .int thing to be less of an issue, because there are only a small number of .int domain names. I'm a lot less confident about the former use though -- It would be a problem for interop but we'd also need an alternative recommendation for ceylon packages.

CeylonMigrationBot commented 12 years ago

[@gavinking] Well, I suppose we need to decide what our standard is. According to the spec today, a package name is comprised of all-lowercase characters and underscores. But according to the actual grammar, they are composed of LIdentifiers (that seems to make more sense). So there's three possible conventions:

  1. camel case
  2. lowercase words separated by _
  3. lowercase words running together

I must admit that I have no idea what is most common in Java. For Ceylon, option 1 seems to make most sense to me,

CeylonMigrationBot commented 12 years ago

[@FroMage] Yeah, I'd go with camel-case by convention too. Though it seems a bit harsh to disallow underscored in there. For example a point where I find camel-case is ambiguous is acronyms. For example I prefer HTTP over Http, but I have to admit that URLHTTPConnection is just bad. So I suppose the convention should be camel-case, even for acronyms. With UrlHttpConnection, so that we never see things like URL_HTTP_Connection.

CeylonMigrationBot commented 12 years ago

[@gavinking] @FroMage I definitely have a strong preference for Http, Url, etc. It's the only thing that really works well for attributes (for example urlHttpConnection).

CeylonMigrationBot commented 12 years ago

[@FroMage] Yes agreed. I just have to get used to it. And we should put that convention in the spec, as even http://en.wikipedia.org/wiki/CamelCase#Programming_and_coding and Java's own conventions http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-135099.html#367 are not clear on the topic.

CeylonMigrationBot commented 12 years ago

[@gavinking] I don't believe we actually have any specification of our naming conventions anywhere. We should document them on the website at least. Or we could put them in a new chapter of the spec.

CeylonMigrationBot commented 12 years ago

[@tombentley] @gavinking how would you feel about disallowing _ only at the end of package names? I realize that's a pretty arbitrary seeming restriction, but it does get us out of this particular hole, while not preventing the legitimate use of an underscore in the middle of a package name. The one thing I don't like about that is it would mean breaking with the aforementioned Java advice for dealing with package names which would otherwise collide with Java keywords.

A simple alternative would be to allow _ in package names except as the first character, and change the quoting for lowercase class names so that the _ was at the front. To be honest, I think I prefer that except for the extra work it entails. Then, I would be tempted to change the quoting for java keyworded package names to append a _ instead of a $, so we were following the Java advice.

CeylonMigrationBot commented 12 years ago

[@gavinking] Isn't there something else we could do, like: packages with _ at the end get mapped to _$ or something lame like that?

CeylonMigrationBot commented 12 years ago

[@tombentley] We could do that, I guess. Though I wonder how this will end up if there are any more collisions like this.

CeylonMigrationBot commented 12 years ago

[@tombentley] In #602 we chose _ over $ because it was easier on the eyes when writing in Java. That's great and I do prefer to see _ over $ in, for example ceylon.language. However, here and in #420 we end up having to use $ anyway in stuff which would also be visible when writing Java because _ is allowed in Ceylon identifiers. I admit in this issue the suggestion of _$ is very unlikely to be seen in normal code, but using $ in #420 will be very obvious.

If we'd chosen $ in #602 then this issue wouldn't really exist, would it? I suppose we'd have to quote Java keyworded package names using something other than $, but _ is the natural choice for that imo, since that fits with the Java package naming advice.

I'll go with _$ if everyone else wants that, but this looks to be like the last chance we have to come up with some rules that don't just appear to be completely ad-hoc.

CeylonMigrationBot commented 12 years ago

[@quintesse] Personally I've always been particular to using $, that way we know for sure we won't have problems with name clashes. That it looks a little nasty when using Ceylon code from Java I don't really mind, especially because to me _ won't win any beauty contests either.

CeylonMigrationBot commented 12 years ago

[@FroMage] I really don't care, either way.

CeylonMigrationBot commented 12 years ago

[@FroMage] Guys, I vote we do the change from _ to $ to keep allowing underscores in package names, simply on the basis that it's entirely possible that some Java packages might be using underscores and that saves us another escape sequence for interop.

CeylonMigrationBot commented 12 years ago

[@tombentley] > Guys, I vote we do the change from _ to $ to keep allowing underscores in package names

You mean in class names, right? I'm inclined to agree.

CeylonMigrationBot commented 12 years ago

[@FroMage] No I really mean to keep allowing _ in package names. Oh you mean we have another issue with methods named foo_ where we should store it in a foo__ class? I don't think that's ambiguous, is it? I mean I know we have some code in the model loader that tries to guess underscore things but I'm pretty sure we should remove that. You told me already why we first try with underscore and then without, but I can't recall?

Oh, anyways this ambiguity will disappear when we change the escape postfix to $ since that is not allowed in names :)

CeylonMigrationBot commented 12 years ago

[@tombentley] @FroMage I don't know what you mean when you said "I vote we do the change from _ to $"

CeylonMigrationBot commented 12 years ago

[@FroMage] I mean just that, why does that sound ambiguous? Changing the toplevel lower-case encoding from foo_ to foo$ fixes the issue with package names that have underscores.

CeylonMigrationBot commented 12 years ago

[@tombentley]

why does that sound ambiguous?

Stef: I vote we do the change from _ to $

Tom (trying to clarify what 'the change' is supposed to mean): You mean in class names, right?

Stef (misunderstanding the question Tom was asking): No...

Well, I thought it was ambiguous anyway...

CeylonMigrationBot commented 12 years ago

[@FroMage] Sorry I lost you. Replacing the '' postfix for lower-case toplevels with '$' removes the ambiguity with packages that end with '' BUT I think that implementing the change that member types use a separator of '::' removes the need for this change, right?

Because now 'foo::Bar' will be different from 'foo.Bar' where 'foo_' is a package. On the other hand because we're not allowed to have type names with the same name as packages in Java, we still have to make the toplevel be encoded to foo$.

CeylonMigrationBot commented 12 years ago

[@tombentley] Yes, although :: makes things completely unambiguous on the ceylon side, for interop we'll still need things to be unambiguous at the Java level, and quoting the class name with a trailing $ would achieve that. Since package names (when a Java keyword) are quoted with a leading $ I can't see a possibility for ambiguity.

CeylonMigrationBot commented 12 years ago

[@gavinking] Please don't do anything that makes calling print() from Java look disgusting.

Sent from my iPhone

On Oct 15, 2012, at 1:36 PM, Tom Bentley notifications@github.com wrote:

Yes, although :: makes things completely unambiguous on the ceylon side, for interop we'll still need things to be unambiguous at the Java level, and quoting the class name with a trailing $ would achieve that. Since package names (when a Java keyword) are quoted with a leading $ I can't see a possibility for ambiguity.

— Reply to this email directly or view it on GitHub.

CeylonMigrationBot commented 12 years ago

[@FroMage] Gavin, tell me who in their right mind would use Ceylon's print method from Java?

It's not like we have a number of escape chars we can use. The total sum of what is allowed in Java but not Ceylon is $.

We could make the toplevel class name escape be foo$toplevel.class if you find that better.

CeylonMigrationBot commented 12 years ago

[@tombentley]

The total sum of what is allowed in Java but not Ceylon is $.

Actually that's not true: We can use any currency symbol we like. Plus I think there are other unicode general categories we could use too. But those possibilities are all worse than using $.

I think Gavin's problem is that in our desire to permanently fix a pretty unlikely corner case we're making the usual interop case 'disgusting'. Sure no one is going to call print() in Java, but this doesn't apply only to print(), but all top levels, any of which someone might be interested in calling.

CeylonMigrationBot commented 12 years ago

[@quintesse] So help me remember, why do we do this again? In which cases can there be a name clash?

CeylonMigrationBot commented 12 years ago

[@FroMage]

So help me remember, why do we do this again? In which cases can there be a name clash?

In Java, classes can't have the same name as packages. So if you have a package named foo_ and a toplevel named foo (for which we generate a class named foo_) then it breaks.

CeylonMigrationBot commented 12 years ago

[@FroMage]

Actually that's not true: We can use any currency symbol we like. Plus I think there are other unicode general categories we could use too.

But they're not ASCII so we can't type them on an ASCII keyboard without trickery.

CeylonMigrationBot commented 12 years ago

[@gavinking]

Gavin, tell me who in their right mind would use Ceylon's print method from Java?

That's not the point. It's going to be very common for there to be a number of toplevel functions as part of the main API for a Ceylon module. If calling those is something from Java/other languages is super-ugly, I think that's going to be a pain-point.

CeylonMigrationBot commented 12 years ago

[@tombentley] What are we going to do for M4? To me, this doesn't look to be a killer bug that really has to be fixed, so perhaps we should leave it for M5?

CeylonMigrationBot commented 12 years ago

[@FroMage] Yeah, the pb only occurs for package names that end with underscores, you're right.

CeylonMigrationBot commented 11 years ago

[@FroMage] Moving to M6

CeylonMigrationBot commented 11 years ago

[@FroMage] Closing, since this depends entirely on spec's #3911

CeylonMigrationBot commented 11 years ago

[@gavinking] I don't agree that this should be closed. #3911 is going to be just a temporary restriction, only for 1.0.

CeylonMigrationBot commented 11 years ago

[@FroMage] It is closed until we open a new issue to bring postfix _ back.

CeylonMigrationBot commented 11 years ago

[@FroMage] Note that while postfix _ for packages may be necessary for interop, the same isn't true for toplevels, and I can't think of any good reason why we'd want to allow that and not Unicode symbols. It doesn't do anything for readability and as such it's hard to justify.

CeylonMigrationBot commented 11 years ago

[@gavinking]

I can't think of any good reason why we'd want to allow that and not Unicode symbols.

We do allow unicode symbols.

It doesn't do anything for readability and as such it's hard to justify.

Since we took $ away, a trailing _ is about the only thing that for example a code generation tool could use.

CeylonMigrationBot commented 11 years ago

[@gavinking] FTR, spec §2.3, and Ceylon.g:3906. You can try it out, unicode identifiers work fine in Ceylon.