ReadyTalk / avian

[INACTIVE] Avian is a lightweight virtual machine and class library designed to provide a useful subset of Java's features, suitable for building self-contained applications.
https://readytalk.github.io/avian/
Other
1.22k stars 173 forks source link

Several methods missing in java.lang.Class #344

Closed bigfatbrowncat closed 10 years ago

bigfatbrowncat commented 10 years ago

We (me and @JustAMan) have tried using GSON (very popular Java JSON serializing/deserializing library) with Avian + Android classpath, and it turns out that Class misses several methods that are needed by GSON. Also note that Class.java is always taken from Avian classpath, so this issue is relevant for all Avian versions (i.e. both Avian's own classpath and Android classpath).

Namely:

Class.isAnonymousClass
Class.isLocalClass
Class.getGenericInterfaces

are missing or throwing UnsupportedOperationException.

We cannot progress further in finding what GSON needs except those three as we cannot stub getGenericInterfaces with something meaningful (stubbing it with empty list results in ArrayIndexOutOfBoundsException somewhere in the code path so GSON needs that method to provide correct results).

It would be wonderful if you could share your thoughts on where we could get the data needed for those methods if you cannot provide a quick fix yourself.

dicej commented 10 years ago

isAnonymousClass: should be pretty easy. Just see if the name ends with a "$" followed by one or more digits.

isLocalClass: as above, except the name will end in "$" followed by one or more digits, followed by a Java identifier

getGenericInterfaces: to get this, you'll need to check if Class.vmClass.addendum.signature is non-null. If it is, it will be a byte array containing a null-terminated string -- you can make it a String with e.g. new String(signature, 0, signature.length - 1). You can parse that to generate a Type array. It will take some work, but the raw signature is already there.

bigfatbrowncat commented 10 years ago

followed by a Java identifier

Is there any way to check a valid identifier? Or I should implement it by myself? I think actually, that I shouldn't check for a valid identifier. I should just search for any non-digit after some digits after $ sign. Am I right?

dicej commented 10 years ago

You can use Character.isJavaIdentifierStart and Character.isJavaIdentifierPart: http://docs.oracle.com/cd/E19798-01/821-1841/bnbuk/index.html

bigfatbrowncat commented 10 years ago

It looks like Character.isJavaIdentifierStart and Character.isJavaIdentifierPart are absent yet too :)

dicej commented 10 years ago

Well, it will be present in the Android build, but you're right that it won't be available in the Avian build unless we add it. They're pretty simple, though, so they should be easy to add.

dscho commented 10 years ago

It looks like Character.isJavaIdentifierStart and Character.isJavaIdentifierPart are absent

It should not be too hard to implement them by extending the CharacterMatcher class. The only thing unclear to me is how to do that without having to make CharacterMatcher public (it is not part of the official class path library, after all).

dicej commented 10 years ago

We could move that class to the avian package and make it public, which is what we've done with other non-standard classes.

dscho commented 10 years ago

We could move that class to the avian package and make it public, which is what we've done with other non-standard classes.

That is a good idea! AFAICT isJavaIdentifierStart() would correspond to new CharacterMatcher("[A-Za-z_]") and isJavaIdentifierPart() to new CharacterMatcher("[A-Za-z_0-9]");.

dicej commented 10 years ago

On Wed, 1 Oct 2014, dscho wrote:

We could move that class to the avian package and make it public, which is what we've done with other non-standard classes.

That is a good idea! AFAICT isJavaIdentifierStart() would correspond to new CharacterMatcher("[A-Za-z_]") and isJavaIdentifierPart() to new CharacterMatcher("[A-Za-z_0-9]");.

Actually, it's more complicated than I thought:

http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isJavaIdentifierPart(char) http://stackoverflow.com/questions/11774099/legal-identifiers-in-java

Any Unicode letter, currency symbol, or "connector" can be used.

joshuawarner32 commented 10 years ago

Yeah - unicode characters complicate things, like Joel said.

Also, unless I'm missing something, it seems rather wasteful to me to implement isJavaIdentifier* as deferring to a very general class. I'd be surprised if matching against a fixed set of character ranges (in hard-coded fashion) wasn't at least several times faster than using CharacterMatcher.

dscho commented 10 years ago

Oh, yes, I forgot Unicode characters...

bigfatbrowncat commented 10 years ago

I think you make the problem too complicated here. As I understand, our task isn't to validate the character. There are only 3 possible cases here:

[anything][last $][any-non-digit][anything] - that corresponds to inner class
[anything][last $][digits] - that corresponds to anonymous class
[anything][last $][digits][any-non-digit][anything] - that corresponds to a local class

We don't actually care what does [any-non-digit] part contain. It could be _ or [a-z] or [а-я] or anything else. The validation isn't our task.

Am I right?

joshuawarner32 commented 10 years ago

Peeking at the openjdk code, it looks like they don't do any name parsing.

Rather:

dicej commented 10 years ago

On Wed, 1 Oct 2014, Joshua Warner wrote:

Peeking at the openjdk code, it looks like they don't do any name parsing.

Rather:

  • They check whether getSimpleName returns an empty string (which, apparently it's supposed to do for anonymous classes)

But getSimpleName does do name parsing.

dicej commented 10 years ago

On Wed, 1 Oct 2014, Ilya Mizus wrote:

I think you make the problem too complicated here. As I understand, our task isn't to validate the character. There are only 3 possible cases here:

[anything][last $][digits] - that corresponds to isAnonymousClass() [anything][last $][digits][any-non-digit][anything] - that corresponds to isLocalClass()

We don't actually care what does [any-non-digit] part contain. It could be _ or [a-z] or [а-я](hey, I'm Russian ;) or anything else. The validation isn't our task.

Am I right?

Yeah, that makes sense.

bigfatbrowncat commented 10 years ago

Please take a look at my solution and tell me if you like it, or not:

  private enum ClassType { Global, Member, Local, Anonymous }

  private ClassType classType = null;

  private void calculateClassType() {
    char[] nameChars = getName().toCharArray();
    // Searching for the last $
    int lastDollarPos = -1;
    for (int i = nameChars.length - 1; i >= 0; i--) {
      if (nameChars[i] == '$') {
        lastDollarPos = i;
        break;
      }
    }
    if (lastDollarPos > -1) {
      int i = lastDollarPos + 1;
      // Assuming i < nameChars.length cause the class name shouldn't end with '$'
      if (nameChars[i] < '0' || nameChars[i] > '9') {
        classType = ClassType.Member; // Not a digit after the last '$'
        return;
      }
      i++;
      while (i < nameChars.length) {
        if (nameChars[i] < '0' || nameChars[i] > '9') {
          classType = ClassType.Local; // Not-a-digit character after some digits after the last '$'
          return;
        }
        i++;
      }

      classType = ClassType.Anonymous; // Only digits after the last '$'
      return;

    } else {
      classType = ClassType.Global; // No '$' in class name means a global class
      return;
    }
  }

  public boolean isAnonymousClass () {
    if (classType == null) calculateClassType();
    return classType == ClassType.Anonymous;
  }

  public boolean isLocalClass () {
    if (classType == null) calculateClassType();
    return classType == ClassType.Local;
  }

  public boolean isMemberClass () {
    if (classType == null) calculateClassType();
    return classType == ClassType.Member;
  }
joshuawarner32 commented 10 years ago

I really like the enum - but I would much rather see this implemented (partially) in terms of getEnclosingClass() and getEnclosingMethod(), rather than parsing out the string. In fact, you can determine everything but Local vs Anonymous just checking whether those are the enclosing class/method are null.

Also, finding the '$' could be elided using getEnclosingClass().getName().length().

Lastly, I'm not sure caching it would be worth-while. These seem like methods that will only be called a small handful of times - and you're burning 8 bytes / class + dcache pollution effects.

bigfatbrowncat commented 10 years ago

I agree that 8 bytes per class is not a necessary overhead. But... do you think that getEnclosingClass().getName().length() would be faster?

joshuawarner32 commented 10 years ago

Obviously the only correct answer is "benchmark it".

And that's a good point, I'm not sure. The 2+ dependent pointer loads are not going to be cheap. On the other hand, neither is creating the new char array (the allocation, copying, and the garbage generated). On the third hand, using charAt() (which it seems avian doesn't have, and would be great to add) might incur some nasty branch mispredicts.

joshuawarner32 commented 10 years ago

And to be clear, I'm not suggesting that you actually have to benchmark it. Just that, you're right, it might not be any faster. Point being, what you posted is fine (provided it matches the openjdk behavior).

dicej commented 10 years ago

On Wed, 1 Oct 2014, Joshua Warner wrote:

And to be clear, I'm not suggesting that you actually have to benchmark it. Just that, you're right, it might not be any faster. Point being, what you posted is fine (provided it matches the openjdk behavior).

Ditto. The proposed implementation, plus a few tests to prove it works, is fine by me.

dscho commented 10 years ago

It is a bit awkward to comment a diff that is not part of a commit (let alone a pull request), but I'll do it nevertheless:

  /**
    * Determines the class type.
    * 
    * There are four class types: global (no dollar sign), anonymous (only digits after the dollar sign),
    * local (starts with digits after the dollar, ends in class name) and member (does not start with digits
    * after the dollar sign).
    * 
    * @return the class type
    */
  private ClassType calculateClassType() {
    if (classType != null) return classType;

    classType = ClassType.GLOBAL;
    final String name = getName();
    // Find the last dollar, as classes can be nested
    int dollar = name.lastIndexOf('$');
    if (dollar < 0) return classType;

    // Find the first non-digit after the dollar, if any
    final char[] chars = name.toCharArray();
    int skipDigits;
    for (skipDigits = dollar + 1; skipDigits < chars.length; skipDigits++) {
       if (chars[skipDigits] < '0' || chars[skipDigits] > '9') break;
    }

    if (skipDigits == chars.length) {
      classType = ClassType.ANONYMOUS;
    } else if (skipDigits > dollar + 1) {
      classType = ClassType.MEMBER;
    } else {
      classType = ClassType.LOCAL;
    }

    return classType;
  }
import javassist.ClassPool;

pool = ClassPool.getDefault();
c = pool.makeClass("ABC$");
print(c.toClass());
dscho commented 10 years ago

Oh, and of course I missed the comment about spending 8 bytes on this... I agree that it is fast enough to calculate, and users who really need recurring access to the calculated values should cache them themselves.

Please just replace the line if (classType == null) return classType; with ClassType classType = ClassType.GLOBAL and remove the now-obsolete line classType = ClassType.GLOBAL;.

bigfatbrowncat commented 10 years ago

Ok, thank you, @dscho. By the way, we decided to remove classType variable out of Class class. Your version is pretty clear. I'll use it if you don't mind. And we could avoid the exception by change only one line:

if (dollar < 0 || dollar == name.length() - 1) return classType;

So we assume any class ending with '$' to be 'GLOBAL'. At least it would be valid. Do you agree? I'll make a pull request, of course, as soon as I get rid of all the missing functions... Or if I decide it's hopeless to make GSON work for now.

bigfatbrowncat commented 10 years ago

getGenericInterfaces: to get this, you'll need to check if Class.vmClass.addendum.signature is non-null.

Now about the Class.getGenericInterfaces() function: I'm prining:

System.out.println(vmClass.addendum.signature);

for CCC.class, where:

interface III {}

class CCC implements III {

}

and it prints null. What should I do?

dscho commented 10 years ago

Your version is pretty clear. I'll use it if you don't mind.

By all means!

And we could avoid the exception by change [...]

The new version should not throw that exception at all... The loop after the dollar would terminate right away because of the condition, without accessing any out-of-bounds index.

So we assume any class ending with '$' to be 'GLOBAL'.

Actually, I assumed they should be interpreted as ANONYMOUS, hence my code picking that...

dscho commented 10 years ago

Now about the Class.getGenericInterfaces() function:

Try

interface III<T> {}

class CCC implements III<Object> {

}
dscho commented 10 years ago

@bigfatbrowncat to clarify: the signature in the addendum is present only if there are really generics involved. Otherwise, you will have to fall back to the non-generic method.

In other words, you will have to copy-edit Class#getInterfaces() to special-case array items when there are generic signatures.

I forgot the exact details, but I did something similar in commit 2d8362297537414de11ea9fe3d76ee01f97c107a to implement Field#getGenericType(). Hopefully you can reuse some of that!

dicej commented 10 years ago

On Thu, 2 Oct 2014, Ilya Mizus wrote:

Now about the Class.getGenericInterfaces() I'm prining:

System.out.println(vmClass.addendum.signature);

for a class:

interface III {}

class CCC implements III {

}

and it prints null. What should I do?

Just call getInterfaces if the signature is null.

bigfatbrowncat commented 10 years ago

Okay. I found out my mistake with parameters. Now I have a question. As a result of Class.getGenericInterfaces() call I should return an array of java.lang.reflect.ParameterizedType interface implementations. Do we have a proper implementation to use? (JDK returns sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl objects)

dicej commented 10 years ago

On Thu, 2 Oct 2014, Ilya Mizus wrote:

Okay. I found out my mistake with parameters. Now I have a question. As a result of Class.getGenericInterfaces() call I should return an array of java.lang.reflect.ParameterizedType interface implementations. Do we have a proper implementation to use?

Not yet, sorry. You could write one with a constructor that takes a signature and a ClassLoader and parses that signature, loading the classes it refers to from the ClassLoader.

bigfatbrowncat commented 10 years ago

Okay. Where should I put the new class? avian package? Maybe it is better to create avian.reflect?

dicej commented 10 years ago

On Thu, 2 Oct 2014, Ilya Mizus wrote:

Okay. Where should I put the new class? avian package? Maybe it is better to create avian.reflect?

Either of those is fine with me.

bigfatbrowncat commented 10 years ago

Am I correct that sun.reflect.generics.reflectiveObjects.TypeVariableImpl should be implemented as well?

dicej commented 10 years ago

On Thu, 2 Oct 2014, Ilya Mizus wrote:

Am I correct that sun.reflect.generics.reflectiveObjects.TypeVariableImpl should be implemented as well?

I haven't heard of that class before. How did you find it? Why do you think you'll need it?

bigfatbrowncat commented 10 years ago

Whew... that wouldn't be a piece of cake, actually. It looks like I've stepped into a deep and dangerous Java reflection mechanisms :)

I've made a small testing facility to find out all the common situations that we could achieve here. Look. 1 First of all - some classes for testing:

package parrot.server;

class Main {
...
    class CC {

    }

    interface III<A> {}
    interface UUU<A> {}
    interface ZZZ<A extends CC> {}

    class CCC<AA, AAA extends CC> implements III<Object>, Cloneable, UUU<AA>, ZZZ<AAA> {

    }
}

2 Then the testing code:

Type[] tt = CCC.class.getGenericInterfaces();
for (int i = 0; i < tt.length; i++) {
    if (tt[i] instanceof ParameterizedType) {
        ParameterizedType pt = (ParameterizedType)tt[i];
        System.out.print("[pt: " + pt.toString() + "; ");
        System.out.print("owner: " + ((Class)pt.getOwnerType()).getName() + "; ");
        System.out.print("raw: " + ((Class)pt.getRawType()).getName() + "<");

        Type[] ata = pt.getActualTypeArguments();
        for (int j = 0; j < ata.length; j++) {
            if (ata[j] instanceof TypeVariable<?>) {
                TypeVariable<?> tv = (TypeVariable<?>)ata[j];
                System.out.print(tv.getClass() + ":");
                System.out.print(tv.getGenericDeclaration() + ":");
                System.out.print(tv.getBounds().length + ", ");
            } else {
                System.out.print(ata[j].getClass() + ", ");
            }
        }
        System.out.println(">");
    } else {
        Class cls = (Class)tt[i];
        System.out.print("{cls: " + cls.getName() + "}\n");
    }
}

3 Testing code output

[pt: parrot.server.Main.parrot.server.Main$III<java.lang.Object>; owner: parrot.server.Main; raw: parrot.server.Main$III<class java.lang.Class, >
{cls: java.lang.Cloneable}
[pt: parrot.server.Main.parrot.server.Main$UUU<AA>; owner: parrot.server.Main; raw: parrot.server.Main$UUU<class sun.reflect.generics.reflectiveObjects.TypeVariableImpl:class parrot.server.Main$CCC:1, >
[pt: parrot.server.Main.parrot.server.Main$ZZZ<AAA>; owner: parrot.server.Main; raw: parrot.server.Main$ZZZ<class sun.reflect.generics.reflectiveObjects.TypeVariableImpl:class parrot.server.Main$CCC:1, >

4 Avian signature (converted to string):

<AA:Ljava/lang/Object;AAA:Lparrot/server/Main$CC;>Ljava/lang/Object;Lparrot/server/Main$III<Ljava/lang/Object;>;Ljava/lang/Cloneable;Lparrot/server/Main$UUU<TAA;>;Lparrot/server/Main$ZZZ<TAAA;>;

Look at TAAA in the end of the signature. It turns into TypeVariable

dicej commented 10 years ago

On Thu, 2 Oct 2014, Ilya Mizus wrote:

TypeVariable<?> tv = (TypeVariable<?>)ata[j]; System.out.print(tv.getClass() + ":");

Calling getClass() on a TypeVariable instance isn't very useful. Here are methods that might be useful, though:

http://docs.oracle.com/javase/7/docs/api/java/lang/reflect/TypeVariable.html

We haven't added those methods to Avian's version of TypeVariable yet, because they weren't needed until now. If Guava needs them, maybe we should add them.

But, yeah, when you start mixing generics and reflection, things get really complicated :)

bigfatbrowncat commented 10 years ago

So the task is: parse signature and make a structure of ParameterizedType, TypeVariable and Class objects... Cool. I'll try to do that...

dscho commented 10 years ago

Well, we kind of have a ParameterizedType implementation: https://github.com/ReadyTalk/avian/blob/master/classpath/java/lang/reflect/SignatureParser.java#L103

dscho commented 10 years ago

parse signature and make a structure of ParameterizedType, TypeVariable and Class objects

See https://github.com/ReadyTalk/avian/blob/master/classpath/java/lang/reflect/SignatureParser.java

dscho commented 10 years ago

See https://github.com/ReadyTalk/avian/blob/master/classpath/java/lang/reflect/SignatureParser.java

To be precise: the parseType() method already does a lot of that work. It needs to be refactored, of course, to support getGenericInterfaces() but that is quite doable. The first thing would be to decouple the handling of what is inside <...> because the current code can only handle generic signatures of fields (which always have an explicit base type, but the base type of a class is implicit).

I just realized that I completely missed to document in the commit 2d8362297537414de11ea9fe3d76ee01f97c107a where I found the relevant information: it was section 4.3 of the Java Virtual Machine Specification (you will find section 4.3.4 especially helpful).

dscho commented 10 years ago

BTW I will be offline for three days; please do not mistake this as a lack of interest from my side (but also: don't let this hold up any progress, although I am of course happy to invest some time to help resolve this issue once I return online)!

bigfatbrowncat commented 10 years ago

Thank you, @dscho. I'm not sure I could solve that without your help. This looks very complicated. I'll tell you when I have some progress.

bigfatbrowncat commented 10 years ago

Please take a look at commits https://github.com/bigfatbrowncat/avian-pack.avian/commit/67cafb118cfeb0416ae795533f5675eda263e388 and https://github.com/bigfatbrowncat/avian-pack.avian/commit/270bbc66f9aa1152907889bf733a2927277a6da5. Both of them are part of pull request https://github.com/ReadyTalk/avian/pull/343

I've made support for everything excepting TypeVariable-s so far. I had to make SignatureParser public (maybe we have to move it to avian package) and improve it a bit (to handle owner class for some case)

Please criticize my code and then I'll make TypeVariable parsing working.

bigfatbrowncat commented 10 years ago

I've fixed test with a small change in https://github.com/bigfatbrowncat/avian-pack.avian/commit/50d32d39e70a0dd2a911d4bc86ff7ae247fcd691.

bigfatbrowncat commented 10 years ago

We've implemented getGenericInterfaces() completely (I hope so). Now even type variables work.

But we have a problem with the simple getInterfaces() function which is a no-template version of getGenericInterfaces() (and we fall back to it if addendum.signature == null).

Example:

interface I1 {}
interface I2 extends I1 {}

class NoGen implements I2 {

}
...
Class[] ifaces = NoGen.class.getInterfaces();
for (Class c : ifaces) {
    System.out.println(c);
}

JDK result:

interface reflect.test.Main$I2

Avian (my latest commit https://github.com/bigfatbrowncat/avian-pack.avian/commit/c178ffd26d5829dcbc4e1c3a9120585e889d3381):

interface reflect.test.Main$I1
interface reflect.test.Main$I2

The problem looks obvious: JDK lists only the directly implemented interfaces, but Avian lists all of them. I don't know how to fix this behavior, please, help me with your advice.

And another (aesthetic) problem: I added TypeVariableImpl class and a very useful collectTypeVariables() function directly to the Class class. Maybe they should form some utility class and appear somewhere else (like SignatureParser does).

bigfatbrowncat commented 10 years ago

Okay, I've fixed the build :) I moved the class and the function into SignatureParser and changed the parse signature. Now Field's getGenericType() function should support type variables as well. The new commit is: https://github.com/bigfatbrowncat/avian-pack.avian/commit/1534a30d2b8eb64838a3aece0d5c8dccb85f20be

dicej commented 10 years ago

On 10/5/14, Ilya Mizus notifications@github.com wrote:

The problem looks obvious: JDK lists only the directly implemented interfaces, but Avian lists all of them. I don't know how to fix this behavior, please, help me with your advice.

That's a bug. I'll fix it when I get a chance.

bigfatbrowncat commented 10 years ago

It would be nice cause GSON doesn't want to work cause of it :)

bigfatbrowncat commented 10 years ago

As I could see, getInterfaces() function takes the data form the interfaceTable that contains all the implemented interfaces. But it should take it from the class declaration instead.

My question: how to take it from the declaration if addendum.signature == null?