eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
397 stars 62 forks source link

Issue with Java Strings as type arguments #488

Closed CeylonMigrationBot closed 8 years ago

CeylonMigrationBot commented 12 years ago

[@FroMage] When using some Java code that returns a java.util.List<java.lang.String> we get a weird issue that String is not assignable to String because we assume that when used as a type parameter, String should be mapped to ceylon.language.String, which is just not always the case. We need to fix that.

[Migrated from ceylon/ceylon-compiler#488] [Closed at 2012-06-04 14:30:13]

CeylonMigrationBot commented 12 years ago

[@FroMage] Same issue with j.l.Object used as type param

CeylonMigrationBot commented 12 years ago

[@FroMage] Though, for j.l.Object we can fix the bug easily, I don't see how to fix it for j.l.String: a List<j.l.String> is just not a List<c.l.String> so we can't pass one for the other. Even raw-casting doesn't get us anywhere since we then lose the real underlying type, and will expect members to be something they are not.

Can anyone remind me why we erase to j.l.String but not for parameterised types? Seems to me we should just always erase to it, though I admit that will require lots of boxing.

CeylonMigrationBot commented 12 years ago

[@gavinking] j.l.String might not satisfy some constraints on the type parameter.

CeylonMigrationBot commented 12 years ago

[@FroMage] Ah true. But the opposite is true… How can we fix this sort of problem?

CeylonMigrationBot commented 12 years ago

[@FroMage] I've simply no idea how to fix this, short of giving up and stopping to optimise c.l.String as j.l.String.

I mean, I see why we're sometimes erasing to j.l.String, but that causes lots of boxing for no good reason, because we're not using j.l.String methods at all, so anything we do with the string, we need to box/unbox.

Tako says that we could produce code that calls helper methods rather than always box/unbox java strings, so we'd keep only java strings, but that doesn't help us if we're still using ceylon strings as type parameters, since that will always require boxing, and doesn't work with the pretended equivalence with java strings (different type parameter constraints)

For primitives this problem doesn't exist, but we decided to not treat j.l.Long and friends as equivalent to the primitives for precisely the reason that it doesn't work. Yet we're doing it to j.l.String when IMO we shouldn't even be using it in c.l.String and rather use int[].

Tako says that we're erasing to java strings because the JVM optimises them and we'd need to convert ceylon strings to java strings anyways for most APIs, which will not be true anymore once we have Pure Ceylon libs.

I think we should just stop treating j.l.String as a c.l.String and make them different types, and not erase one to the other.

CeylonMigrationBot commented 12 years ago

[@quintesse] After discussing this some more with @FroMage we seem to have some serious problems with Java interop here, right? Let's say we have a type Foo<T> and that both Java and Ceylon use it. We could probably extend the boxing code to make it understand a Foo<String> coming from Java as actually using j.l.String and therefore add special (un)boxing code, but we could never do the same the other way around while trying to pass a Foo<String> to Java. (And of course we haven't even thought about Java code trying to interop with Ceylon an actually using c.l.String somewhere)

CeylonMigrationBot commented 12 years ago

[@quintesse] The only thing we've been able to think of thus far is to stop treating j.l.String as being equivalent to c.l.String, of course this would mean that Java interop would be far less nice, having to do javaObject.name.string for example. There are several way forward if we would do this split, but first I think it's best if you, @gavinking , think about this a bit and tell us if you agree with our assessment or not.

CeylonMigrationBot commented 12 years ago

[@gavinking] So, after some playing to see what the interop code actually does, it appears that there is no actual hole in the type system, since the following code all compiles and runs:

import java.lang { Str=String }
import java.util { List, ArrayList }

List<Str> slist = ArrayList<Str>();
slist.add(Str("hello"));
slist.add(Str("goodbye"));
for (i in 0..slist.size()-1) {
    Str s = slist.get(i);
    print(s);
}

The compiler doesn't treat java.lang.String as ceylon.language.String, or vice-versa. Which is correct. A List<Str> is simply not a List<String>. Even creating a subclass of ArrayList<Str> works out correctly.

So it seems that:

  1. The actual bug here, such as it is, is just that the model loader tries to map Foo<java.lang.String> to Foo<ceylon.language.String>. I can't really see how that's ever going to be correct even with all the bending and twisting in the world, so I think we should simply stop doing that.
  2. This opens the question of whether it's still OK to map plain java.lang.String to ceylon.language.String when it appears directly in a signature (i.e. not as a type argument). I don't see why not, but perhaps I'm missing something.
  3. Finally, we arrive at what I suppose is the real issue here: is it completely unacceptable from a usability perspective that we sometimes have some Java APIs giving us back plain java.lang.String instances? Well, it's certainly going to get in the way when it happens, but I don't have a strong intuition on how common this is. How often do we really encounter java.lang.String as a type argument in a Java API?

Now, lets say we discover that Array<java.lang.String> is popping up all over the place in Java APIs. Then one possible solution is to box it. Introduce a Ceylon type named Strings in the language module and box Java java.lang.String[] to that type. But how to generalize this idea? Well, what's really interesting about this solution is that for the cases which I think well probably arise, there are some really obvious types to box them to:

Now, of course, the nice thing about this is it also neatly resolves the problem of not being able to iterate java.util.List<java.lang.String> in a for loop. So it seems like we should probably do this kind of boxing all the time, and merely specialize it when the type argument is java.lang.String. So indeed this issue ties in with something which has been in the back of my mind for a long time, which is to map Java collection types to Ceylon collection types.

Yes, I'm ignoring the mutability issue for now, and there might be some other problems here that I'm missing.

CeylonMigrationBot commented 12 years ago

[@gavinking]

For primitives this problem doesn't exist, but we decided to not treat j.l.Long and friends as equivalent to the primitives for precisely the reason that it doesn't work.

I'm still not convinced that this is the right thing, and we might need to revisit it. Certainly we can't have model loader do to Long what it is currently doing to String, but if we stopped treating type args in this broken way, I don't see why a method that simply returns java.lang.Long can't be transparently mapped to Ceylon Integer.

Yet we're doing it to j.l.String when IMO we shouldn't even be using it in c.l.String and rather use int[].

There's certainly a good argument for that. I'm conflicted.

Tako says that we're erasing to java strings because the JVM optimises them

Well, we don't know the details of this optimization or even if it really exists.

CeylonMigrationBot commented 12 years ago

[@gavinking]

I think we should just stop treating j.l.String as a c.l.String and make them different types, and not erase one to the other.

...

The only thing we've been able to think of thus far is to stop treating j.l.String as being equivalent to c.l.String, of course this would mean that Java interop would be far less nice, having to do javaObject.name.string for example.

I mean, it's an option, and it would not be the end of the world, but hell, all those calls to Str(s) and s.string are going to add up pretty quick, it seems to me.

CeylonMigrationBot commented 12 years ago

[@FroMage]

The actual bug here, such as it is, is just that the model loader tries to map Foo<java.lang.String> to Foo<ceylon.language.String>

No it doesn't, it (confusingly) reports that: Foo<String> is not assignable to Foo<String>. I thought I could make that work, but it just can't.

The problem I see with trying to autobox ceylon and java strings is that it doesn't always work:

import java.lang { Str=String }
import java.util { List, ArrayList }

List<Str> slist = ArrayList<Str>();
slist.add(Str("hello"));
slist.add(Str("goodbye"));
for (i in 0..slist.size()-1) {
    Str s = slist.get(i);
    print(s);
}

It's not at all obvious to the users why the following autoboxing doesn't work:

import java.lang { Str=String }
import java.util { List, ArrayList }

List<Str> slist = ArrayList<Str>();
slist.add("hello"); // autoboxing
for (i in 0..slist.size()-1) {
    String s = slist.get(i); // autoboxing
    print(s);
}

Yet it doesn't.

When we do autoboxing of java strings to/from ceylon strings we give the impression that the two are interchangeable, which is indeed the case for Java's autoboxing (except for null), but then users will notice that there are some situations where autoboxing doesn't work and they will point at it and say: "look, this is inconsistent".

CeylonMigrationBot commented 12 years ago

[@gavinking]

The actual bug here, such as it is, is just that the model loader tries to map Foo<java.lang.String> to Foo<ceylon.language.String>

No it doesn't, it (confusingly) reports that: Foo<String> is not assignable to Foo<String>.

OK, I was just going by the initial description of the issue, which seems to say that:

we assume that when used as a type parameter, String should be mapped to ceylon.language.String, which is just not always the case. We need to fix that.

If that's not the case (i.e. we in fact don't always assume that), then there is no "actual bug" here at all, just the observations that:

  1. sometimes the typechecker should use qualified names in its error messages (easy fix), and
  2. the question of whether its unacceptably confusing that the we usually, but not always, expose Java strings in Java APIs as Ceylon strings.

Again, I simply don't know the answer to 2. I'm inclined to think it might not be a really big issue.

CeylonMigrationBot commented 12 years ago

[@FroMage] You did fix one of the error messages, but there's another confusing one for this case:

    List<JString> jstringList = java.jstringList();
    jstringList.add("foo");

(Where java.jstringList() returns a java.util.List<java.lang.String>)

test-src/com/redhat/ceylon/compiler/java/test/interop/Types.ceylon:323: error: argument must be assignable to parameter 
arg0 of add: String is not assignable to String?
    jstringList.add("foo");
                    ^
CeylonMigrationBot commented 12 years ago

[@gavinking] > You did fix one of the error messages, but there's another confusing one for this case:

Right, there are a number of cases like this. But I've no idea how to detect them in general. I don't want to always use fully-qualified names in error messages.

CeylonMigrationBot commented 12 years ago

[@FroMage] 558111626f0ca3e5b7767c7d95ba0d2c5c0f868e and 775c582cd6fdded3656b3cda1dd0203b7fc7f4d8 were for this issue. Now fixed.