Persistent (on-disk) IL representation in the compiler for dynamic runtimes

fjeremic commented 6 years ago

Background

Although this is likely to be a mammoth effort or it may not even be feasible I think it is worth discussing with a broad audience. One of the more time consuming aspects of development on the compiler component is the non-deterministic nature of compilations from one application run to the next.

This non-determinism effectively boils down to the JIT compiler behaving slightly differently, based on a multitude of time-sensitive factors. The non-determinism make it very difficult to track down problems as typically we need a compilation trace log to determine what steps the compiler took to perform an incorrect transformation. Collecting a trace log however has a double-slit experiment-esque effect in which the mere action of collecting a trace log will affect the timings in the JIT and the compilation can therefore be different. This means problems in large methods, and hence large trace logs, may be impossible to fix because we cannot reproduce the issue and collect a trace log at the same time.

Current Approach

Our current approach to this problem is to throw massive amounts of machine time at the problem and run the same test hundreds and sometimes thousands of times to create a reproduction, and more likely to test whether a particular option will cause the issue to disappear. This is an incredibly expensive and time consuming task, both in machine and human resources invested in fixing these non-deterministic bugs.

Discussion

I wanted to start this issue to discuss what prototypes have been performed in the past to try and tackle this issue and any other ideas of what else we can try. I know the likes of @mstoodle has attempted some form of serialization in the past, so even talking about the successes and failures of such projects can help drive the discussion towards a potential solution.

I understand this issue is also present in OMR but I opened it up for OpenJ9 as the JDK is a dynamic runtime and any sort of serialization effort will need to capture such runtime information in a repeatable way.

fjeremic commented 6 years ago

I'm not sure if we have investigated before or what the licensing legal issues are on Boost. It does have a very nice header only library for serialization:

http://www.boost.org/doc/libs/1_65_1/libs/serialization/doc/

mstoodle commented 6 years ago

This is fairly closely tied to discussions we've had in the past in the context of creating the TRIL language for Eclipse OMR.

@0xdaryl and I have tried a bit here, with some help from people working for @smlambert. Initially, we tried a brute force approach to manually write serialize/deserialize routines starting at the top level data structures and working our way down. i.e. you call methodSymbol->serialize(writerObject) and it walks the IL to serialize using writerObject to store the output. The strategy was to inject such a serialize call followed immediately by a deserialize call right after genIL() was called. That way, we could tolerate intermediate states where not all fields were written or where some process-specific pointers were just being written directly into the output. You could build up from there until (the thinking was) all the necessary items could be properly serialized.

Under my possibly foolish notions, we started with an AOT-like approach to define a set of primitive read/write facilities (very like how RelocationTarget is used to read/write things into the binary AOT relocation data). Unfortunately, this approach turned out to be (predictably, I suppose) cumbersome due to 1) the amount of code needed in the serialize/deserialize functions for each data structure and 2) lack of tolerance to changes in those data structures coupled with nobody being aware we were trying to do it.

Despite its disadvantages, we had it working all the way down to Symbols, I believe, by the time we reached the "give up" point. Eventually, we would have had to resolve some very AOT/shared classes like problems, like "how does one describe the precise class we want, since a class only really comes into existence inside a JVM process and ceases to exist when the JVM stops".

I remain somewhat dubious on the use of general tools to help us here, but I will concede that our first effort without using them was not super successful. To be fair, some of the data structures have simplified since this work was done.

You'll probably want to figure out what you want to achieve by persistence. If you want to be able to run generated code from persisted "IL" that is a bigger challenge than just being able to reproduce a log file.

For example, if you "just" want to nail down the dynamic inputs to compilation, then I would suggest persisting IL is not something you need to do: the bytecodes may be enough, coupled with the answers to the questions that the JIT asks during compilation...

JamesKingdon commented 6 years ago

coupled with the answers to the questions that the JIT asks during compilation...

When I heard about JITaaS I was struck by the overlap with previous attempts at replay compilation. Perhaps logging JITaaS queries and responses is something we could leverage here? Although I have the feeling that if we had a persistent IL we would find ample opportunities to exploit it.

0xdaryl commented 6 years ago

This has been a long-time thorn in our side for diagnosing JIT problems. We've taken several runs at it over the years. Nothing ever reached production status. One of the earliest solutions was something @jameskingdon referred to as "replay compilation". The idea there was to use a core dump from the problematic JVM process to recreate the failing environment in a live secondary JVM where compilations could be restarted. In effect, the environment of the failing JVM (now out-of-process) would supply the answers to the context questions asked by the secondary (now in-process) JVM. This approach worked better in certain contexts (e.g., a crash in the JIT) than others (a crash in JITed code). Maintaining symmetry between a problematic VM and the secondary VM was challenging (if data structures were different then things would go off the rails quickly). Also, mapping the address space from the core file into the live process was challenging to get to work on most OS's. It was fragile and we abandoned it.

@mstoodle gave a fairly good description of our attempt at IL serialization. One of the real problems we faced there was how fragile the serialization was. Each field was manually serialized, and it was difficult to track additions/deletions and re-orderings of members. When we left it we were thinking about exploring Google's Protocol Buffers (https://developers.google.com/protocol-buffers/) or Apache Thrift (https://thrift.apache.org/) to make the serialization more rigorous. Serializing JIT data structures, however, will only be useful for a certain class of problems. For the kinds of issue you're referring to you will need to be able to capture the context in which the compilation occurred.

There have also been some prototype attempts to record the questions and answers asked by the JIT to the VM (i.e., the FrontEnd interface) to capture that context. However, the overhead of the prototypes I'm aware of were so high (due to the volume of requests and the compile-time overhead of processing/storing it) that this solution was only useful in limited contexts. I don't think this approach should be abandoned, however, as there are opportunities for batching requests and caching that were not explored. JITaaS will run into some of these problems and will need solutions.

One of the longer-term goals for Tril is to be able to express the environment in which the compilation occurred with the intent of injecting it back in for either testing or problem determination. This includes identifying the minimum set of information that the compiler needs to export in order to recreate a compilation context faithfully. We've given this some initial thought, but the needle hasn't moved too far yet. I have some Tril epics to create in the Eclipse OMR project of which this will be one of them.

fjeremic commented 6 years ago

You'll probably want to figure out what you want to achieve by persistence. If you want to be able to run generated code from persisted "IL" that is a bigger challenge than just being able to reproduce a log file.

In the context of Java the bytecode input is deterministic from run to run. Only outside factors are non-deterministic, such as interpreter profiling information, block frequencies, value profiling, etc. I suppose what I'm looking for is deterministic recompilations of an arbitrary method across different invocations of the JVM process.

For example, if you "just" want to nail down the dynamic inputs to compilation, then I would suggest persisting IL is not something you need to do: the bytecodes may be enough, coupled with the answers to the questions that the JIT asks during compilation...

This is likely more along the lines of what I was envisioning, i.e. identifying the "answers to the questions that the JIT asks during compilation..." subset of the JIT and somehow persisting this part during one JVM invocation to the next.

The goal would be for an arbitrary method A compiled by JVM1 during one invocation to be compiled identically in JVM2 to produce A', where A == A' (modulo any static address differences such as J9Method pointers encoded in the JIT body, etc.).

@JamesKingdon referred to as "replay compilation". The idea there was to use a core dump from the problematic JVM process to recreate the failing environment in a live secondary JVM where compilations could be restarted. In effect, the environment of the failing JVM (now out-of-process) would supply the answers to the context questions asked by the secondary (now in-process) JVM.

Interesting. This very much so predates me. I did a quick search and found the following whitepaper: https://www.researchgate.net/publication/221321785_Replay_compilation_Improving_debuggability_of_a_just-in-time_compiler

This seems like a very interesting approach. Do we still support any of this currently?

This approach worked better in certain contexts (e.g., a crash in the JIT) than others (a crash in JITed code). Maintaining symmetry between a problematic VM and the secondary VM was challenging (if data structures were different then things would go off the rails quickly). Also, mapping the address space from the core file into the live process was challenging to get to work on most OS's. It was fragile and we abandoned it.

This seems quite an interesting approach. I suppose the obvious pitfall here is that the core file is generated in a point in time whereas methods are continuously being compiled by the JIT, so a method X compiled at time Y by the JVM may not be reproduced when the same method is compiled at time Z where Y < Z.

JITaaS will run into some of these problems and will need solutions.

This seems quite relevant here. @mpirvu are the queries captured by the frontend the only non-deterministic sources of information for the JIT? Do we have a definitive list of non-deterministic information sources which affect the compilation of an arbitrary method?

For example:

interpreter value profiling information
interpreter branch bytecode counters
JIT inlining decisions
???

If it is even possible to come up with such an exhaustive list and porperly isolate the components in the JIT then this is the only piece that would need to be persisted from one VM to the next to have deterministic compilations. Any thoughts on this?

smlambert commented 6 years ago

With respect to the IL serialization efforts mentioned by @mstoodle and @0xdaryl, I recall there was the extra challenge of creating that prototype while the JIT code was being actively refactored, making it nearly impossible to keep up with the changes (based on how the first cut of it was implemented). I think the story would be different now that the JIT code is mostly refactored and more cleanly divided into the different layers (OMR & Java).

I will follow this discussion, as it is a very interesting one to tackle (and its related to some recent discussions we have been having about methods of gathering implicit data / environmental factors / JIT 'mood map' (to create a statistical model / application fingerprint) to help better understand/test the JIT.

mpirvu commented 6 years ago

Do we have a definitive list of non-deterministic information sources which affect the compilation of an arbitrary method? I don't think such a list exists currently. From my own experience, other factors that can affect compilation (besides IProfiler and inlining) are:

sampling mechanism

state of the class hierarchy table

environmental factors that we exploit with our heuristics (CPU, memory)

history of other methods (are these methods compiled or interpreted? If interpreted what is their invocation count?)

andrewcraik commented 6 years ago

To add one more to the list - the size of a compiled method body and its optimization level can influence inlining decisions when that method is considered for inlining into other bodies.

mstoodle commented 6 years ago

Once you're talking about two different JVM instances, then you also open yourselves up to different command-line options, different classes on disk/generated, etc. i.e. all the craziness that shared classes and AOT have to deal with. (Oops, I forgot I ran javac on that file ...)

You could run it on a different machine with different CPU, different amount of memory, which may influence default heap size, which may influence how many bits we shift addresses by, it can be quite the rabbit hole.

That's one reason I asked whether you want to actually run the resulting code, because that's a very different ball game than just being able to reproduce a log for the method.

It also depends on how protective and complete you want the JVM to be about environmental differences. In some scenarios, you want the JVM to tell you "nope, this doesn't smell right, I'm bailing". In others, you want it to just look the other way about that supercalifragilistic JIT option you're using to diagnose a problem... :) .

The AOT cache header encodes some of these kinds of things, but it's at a particular trade-off in terms of implementation complexity/brittleness vs user scenario impact vs developer scenario impact. That may not be the perfect trade-off for other (even similar) use cases.

The class hierarchy is one of the most challenging ones, because you have to pick a time in that second JVM when the set of resolved classes matches the set of resolved classes that were around when the method was compiled in that earlier JVM. It's even possible that time may never exist due to timing differences. Just pick 3 classes and say your method gets compiled with A and B resolved but not C. But you may not be able to reproduce the scenario where A is resolved before C (because it's some wicked intermittent problem, of course), or maybe A is only rarely loaded because of some rare event in the application that's being run.

Anyway, I'm sure there is a reasonably common class of problems where we could do a lot better, and JITaaS, whenever it arrives :), is something that's should help to provide some of the fundamental pieces.

eclipse-openj9 / openj9