aionnetwork / AVM

Enabling Java code to run in a blockchain environment
https://theoan.com/
MIT License
49 stars 25 forks source link

Fix common "superclass" detection when interfaces are involved #362

Closed jeff-aion closed 5 years ago

jeff-aion commented 5 years ago

TypeAwareClassWriter is called by ASM to coalesce branching paths into matching machine states when dissimilar types need to be unified.

This currently fails when either of the types being coalesced is an interface, as it immediately decays into a common IObject. In cases where they should coalesce to something more specific, this can cause verify errors, causing a rejection of the contract deployment.

Additionally, this can cause NPE when operating on an interface provided in the shadow JCL, since we don't fully support interfaces from there within our common superclass logic. This will similarly cause a contract rejection.

The case involving the shadow JCL should be easily resolved, and the common cases of interface handling should also not be too difficult.

There is, however, one case where this cannot be easily resolved. This is the case where there are multiple correct interpretations of "common class": consider coalescing different classes/interfaces which both implement more than one unrelated interface. javac currently handles this case by defaulting to 1 option and then emitting a checkcast if another option was correct. Since we don't have enough context to know which option was chosen and what other modal opcodes may follow, we can't handle this. Note that the user can force a specific interpretation through a cast in their source.

Based on the difficulty in solving this problem without special knowledge of later use-cases in the instruction stream, the poor job that javac does in addressing the same problem, the uncommon nature of this problem, and the ability to make it explicit with a cast in the code, I suggest we define ambiguous type unification cases such as these as rejected cases.

The other general cases will be addressed by completing the interface handling in TypeAwareClassWriter.

jeff-aion commented 5 years ago

I suspect that this requires a replacement of the Forest and ClassHierarchyForest classes, creating something which is general-purpose and can more concretely operate on the post-rename versions of the resources.

This also means that it should be possible to use this tool on the shadow JCL and API, meaning that we can verify the correctness of the type system across all components.

The plan is to build the new graph much like the existing Forest, but also capture interface relationships and other restrictions (such as final classes). After loading, this can be walked to ensure the type system is complete (the user isn't missing something) and properly connected (references the shadow JCL in only permitted ways, etc).

When looking for the common overlap type between 2 given types, we can build the entire set of unification for each input type, find the intersection of these sets, then walk the intersection to remove any types which have subclasses or implementing classes also within the set. After this removal process, if 1 type remains, it is the common type, while any larger number would mean the unification is ambiguous and should be rejected.

jeff-aion commented 5 years ago

Passing this to @aionick since I keep getting pulled off to work on other things.

A branch which demonstrates where array unification is currently missing from our broader type handling is here: https://github.com/aionnetwork/AVM/tree/jd/unify_arrays

Early experiments around the general type unifier (before realizing that there were problems with exceptions and arrays) can be found here: https://github.com/aionnetwork/AVM/tree/jd/issue-362

aionick commented 5 years ago

The family of type bugs being explored here basically comes down to the TypeAwareClassWriter's inability to find the appropriate super class with which to unify two distinct types.

It relies on the ParentPointers class, which maps a class to its super class. These pointers are discovered by walking the ClassHierarchyForest.

The forest has proved inadequate on a few levels. First, it does not contain interfaces (this is what allows for the simplistic ParentPointers mapping of a class to its only super class). Second, it contains only user-defined classes (with a few minor exceptions: java/lang/Object, java/lang/Enum, java/lang/Exception).

The plan is to re-implement the forest so that it becomes a tree, with java/lang/Object as the root, and so that this tree contains interfaces as well as the full scope of possible classes we can encounter (not just user-defined ones).

From here, we will then search for the 'tightest' common super class (since classes now possibly contain multiple super classes), where the 'tightest' common super class is defined as a class that is a super class of the 2 distinct types, such that no child of this class is also a super class of these 2 types.

It is possible to still find multiple 'tightest' common super classes, in which case we simply do not have enough information available to resolve the unification and so consider it an ambiguous unification, which will be rejected upon deployment.

aionick commented 5 years ago

Currently have a new implementation of the forest ready, along with the tightest common super class algorithm we need. After running the latter through some unit tests, I will try to install these classes into the DAppCreator so that the TypeAwareClassWriter eventually gets hold of it.

The plan is to determine what discrepancies still remain between the old and new forest (the new forest should be a super set of the old one), and then to confirm that the new forest can answer any question that the old forest could answer.

This will at least indicate we are on the right path, and may perhaps also warrant this first stage of the solution being pushed into master, since it would be at least as good as the old implementation and also be much more in line with what we will need in order to fix the rest of these bugs.

aionick commented 5 years ago

After briefly installing the hierarchy so that it gets created from the same LoadedJar as the old hierarchy, I've confirmed that in my test case the new hierarchy has all of the same classes but with the class relationships fixed up (especially the child class <-> super interface relationship). So it looks to be at least as good at answering super types questions.

The next step, before comparing the old and new hierarchies fully, is to implement a transformer so that we can derive a post-rename hierarchy from a pre-rename hierarchy. This solution is not optimally space (or time) efficient but it should really help the early stages of this development, and arguably is a better approach than coming up with some clever hierarchy that lives in both worlds simultaneously.

At least we can explicitly reason about the transformation process from here, and not much actually has to be done to the hierarchy class itself.

aionick commented 5 years ago

Did not move forward with the transformation. I believe it will be mostly useless because we are probably only concerned with post-rename classes here. Currently there's a lot of 'dead weight' to my changes because I began with pre-rename classes and at the moment we can talk about both. But if we discover the pre-rename is unnecessary then there's lots of this to be cut down.

Just loaded the hierarchy via the loadedDapp and tested out a few getTightestCommonSuper calls and it's looking good. Seems like we can answer everything (and then some) that the old hierarchy could answer for user-defined classes, enum, user-exceptions and JCL exceptions.

Currently, I've been dropping the 'Object' super class if a class has a super interface to save space (removes a lot of 'extra' pointers) but this is not strictly correct (it can lead to 2 concrete classes unifying to IObject instead of shadow/Object) and might be worth using the extra space.

We can at least properly unify the exceptions, enums, user-defined classes and userlib classes by the looks of it.

aionick commented 5 years ago

Today:

Mostly have been testing this against the suite of unification tests @jeff-aion wrote and which I added to. Basically at the point of being able to resolve all of the unification issues (though I really would like to get test coverage that hits these methods just to really be sure things are in the right shape!) not involving array wrappers - have not gotten to array wrappers yet.

Some interesting consequences...

I noticed we needed a few java/lang/ error types [not shadow] (NoSuchFieldError, LinkageError, etc) to resolve some unifications. Looks like this is related to reflection and I'm guessing this is simply due to our injected code (these are unified against one another and against wrapped exceptions from what I'm seeing so far). But I'm worried what the actual scope of things like this is...

Also we are unable to resolve some very basic unifications like a user-defined enum and BigInteger because these become ambiguous due to our stronger type system (unifying to Object and Serializable equally). This is really just because we are adhering so strongly to the real type system and in the shadow world interfaces don't subclass the shadow Object.

aionick commented 5 years ago

Finished handling the exception wrappers and array wrappers, which completed the normal non-debug mode scope of the TypeAwareClassWriter changes.

After turning my attention to debug mode, I found some further complications there. User-defined classes do not get renamed in debug mode. To keep the hierarchy accurate, we first have to gather up the set of user classes, construct their class info, and then update their parent classes if any of these parents are not user-defined classes (in this case they must be renamed). In addition to this, there was a bit of logic changes to the TypeAwareClassWriter to adjust for this new fact, mostly in terms of how we handle the common super class.

This is now complete. I manually ran through all the tests in core and verified they were all passing in debug mode and non-debug mode.

This convinces me pretty well that the old hierarchy can completely be replaced by the new one - at least in those places where the hierarchy is being used for its type-mapping capabilities. There is still some functionality in the old hierarchy that must be kept around.

Next step is to completely replace the old hierarchy (there are about a dozen places - SimpleAvm, HashCodeTest, StringConcatenationTest, etc.)

Once this final step is finished and everything looks good I will begin cleaning up these massive commits I've got - lots to be removed and refactored - and begin merging them in as I can.