GaloisInc / saw-script

The SAW scripting language.
BSD 3-Clause "New" or "Revised" License
438 stars 63 forks source link

Modern Java support #861

Closed robdockins closed 3 years ago

robdockins commented 4 years ago

I discovered, while working on standing up a new computer, that the way the SAW infrastructure interacts with the Java ecosystem is by now quite outdated. Starting with Java 9, the JRE no longer distributes its base libraries in a monolithic rt.jar, but instead distributes these classes in a new 'JMOD' file format. This is part of an overall infrastructure philosophy change that introduces a new link phase and utility jlink, which gathers together all the transitive dependencies of an executable into a "linked" executable.

The following is the best explanation of this system I've found https://stackoverflow.com/questions/44732915/why-did-java-9-introduce-the-jmod-file-format

For our purposes, I think it would actually makes our life simpler. As we do with with LLVM (via clang and llvm-link), we can rely on the JDK utilities like jlink and jimage to collect together all the pieces we need into one place, rather than having to do complicated classpath handling, discovering where the system has installed various bits we need, etc.

RyanGlScott commented 3 years ago

I've been looking into this recently, and I thought I'd record some of my findings that others may find interesting:

atomb commented 3 years ago

One middle ground would be to use the jimage extract command above, but then to still run the Haskell class file parser lazily. The Haskell code is the slowest part of the system, and the most memory intensive, so it's the part we want to be careful about running when it's not necessary.

RyanGlScott commented 3 years ago

You raise a good point. In my mind, I was lumping together the processes of (1) uncompressing the archive and (2) parsing the classes themselves, but they're really separate things. If the cost of (2) outweighs the cost of (1), then it might not be as vital to optimize (1) (which is what https://github.com/GaloisInc/saw-script/issues/861#issuecomment-756889807 benchmarks) as I originally thought.

robdockins commented 3 years ago

I suspect the key thing is to control the amount of memory usage, which is more important than the time required.

My inclination would be to use jlink to do all the dependency resolution/gathering from within the Java ecosystem, and then extract the result.

RyanGlScott commented 3 years ago

Upon further thought, the jdeps-followed-by-jlink approach is trickier than I imagined it would be. This is because in order to fully specify the modules that jdeps needs for dependency resolution, it needs to know all the .class files that are used in a program. SAW scripts, however, don't specify .class files ahead of time, as they can be specified at runtime using java_load_class. This means that if we wanted to use jdeps followed by jlink to create a minimal JRE from which to extract classes to load, then in the worst case, we'd have to recreate this JRE each time a script invokes java_load_class. After all, each class could introduce new dependencies that previously encountered classes did not use!

robdockins commented 3 years ago

I don't think we should take the current SAW interface as given. If we need to force changes to the way JVM code is loaded into SAW to make things work better, I think that should be on the table.

RyanGlScott commented 3 years ago

OK, here's one idea. As SAW is currently designed, we already have to specify the JARs and classpaths that a script needs ahead of time using the --jars and --classpath flags, respectively. When a script starts, no meaningful work is done with the classpaths (besides storing it for later use). Later, when SAW attempts to load a class it hasn't seen before, only then will it search through the classpaths to see if there is a match.

Instead of doing this, when SAW starts a script using modern Java, SAW can take all of the specified JARs and classpaths, pass them jdeps, and then use that information to jlink together a suitable JRE. This JRE will be extracted to a temporary directory and then can be consulted whenever a new class needs to be loaded.

The upside is that we only need to traverse the classpaths once with this approach. The downside is that this process may take longer than is strictly required if the user specifies more classpaths than what are actually needed to run the classes they care about, but I don't think much can be done about that, save for advising users not to over-approximate their dependencies.

atomb commented 3 years ago

I like the approach you're suggesting, @RyanGlScott. I think at the very least it's worth trying and benchmarking to see if it has any problematic overhead in common cases.

RyanGlScott commented 3 years ago

Some further investigation reveals that jlink is not the silver bullet I was hoping it would be, for two reasons:

  1. In the comments above, I gave the impression that jlink would produce a JIMAGE file that contained all of the classes used in an application. Unfortunately, that is not the case. jlink will bundle all of the classes that live in named modules, but any classes that do not live in named modules will be excluded. As a result, one will still need to rely on traditional classpath-based resolution in order look up any classes that aren't included in the JIMAGE file. Since most Java classes in the wild don't yet use modules (including most of the examples in the saw-script repo itself), that means the old ways of resolving classpaths aren't going away time soon.
  2. OK, so we'll still have to look up things in classpaths. Bummer. But looking up classes in the jlink-produced JIMAGE file should be quick, right? Well, perhaps not as quick as I would have expected. Recall from https://github.com/GaloisInc/saw-script/issues/861#issuecomment-756889807 that when you jimage extract a JIMAGE file, the resulting directory structure will look like this template:

    <extracted-jimage-path>/<module-name>/<class-name>.class

    I was originally hopeful that when we look up a class that lives in a module, we could do so quickly by exploiting the uniquely determined directory structure above. But here's the catch: a ClassName, to the best of my knowledge, only contains the <class-name> part and not the <module-name> part of the template above. Moreover, it's not clear to me how to determine the name of a class's module by simply parsing the corresponding .class file, which is what SAW does. (I've tried reading the JVM specification for the .class file format, but I wasn't able to see an obvious way to do this.)

    As a result, extracting a JIMAGE file is effectively tantamount to adding some extra directories onto the classpath, as we must search through each <module-name> directory in order to look up any classes that are a part of the JIMAGE file. (There's some unresolved questions about the best way to faithfully emulate what Java does here. For example, should classes from JIMAGE files on the module path be looked up before classes from the classpath? After? I wasn't able to find any documentation on this point.)

Although jlink isn't a silver bullet, the silver lining is that we might not need to change SAW's Codebase type (responsible for loading .class files) at all. In light of the observations above, loading class files from a JIMAGE file is basically a slightly more convoluted version of what Codebase already does: add some directories to the classpath and consult those whenever classes are loaded for the first time. The only new thing that SAW would have to do is use jdeps/jlink/jimage to produce the thing that gets added to the classpath.

RyanGlScott commented 3 years ago

I have good news and bad news. tl;dr I have a working prototype, but there are some serious performance issues that we should consider carefully.

The good news is that I've pushed a proof-of-concept implementation of the "use jdeps/jlink/jimage to produce the thing that gets added to the classpath" idea from https://github.com/GaloisInc/saw-script/issues/861#issuecomment-762332870 to the wip/T861 branch. The key change is the introduction of a new --java-bin-dirs flag, which allows specifying the location(s) where jdeps, jlink, and jimage live. With this, I am able to run SAW on some examples in the SAW repo with Java 11.

The bad news is that the startup overhead imposed by this approach is quite severe. I was worried about jimage extract taking a long time, but the time that jimage takes is minuscule compared to jlink. Running jlink on even a tiny example that only uses java.base takes at least 3 seconds! To be fair, SAW is only using a small portion of the resulting JIMAGE file (which is about 500 megabytes total), but even passing flags such as --no-header-files --no-man-pages to reduce the size a bit doesn't make much of a difference.

Another annoying problem is that we extract the contents of the JIMAGE file to a temporary directory which persists after running SAW. This leaves behind about 90 megabytes of cruft each time SAW is invoked. For a single invocation, this isn't really noticeable, but if you were to invoke SAW many times consecutively, I could foresee this adding up quickly. In fact, I'm too afraid to run the test suite at the moment, since I'm worried that I'll chew through too much disk space...

The temporary directory size problem is likely solvable if we make sure to clean up the temporary directory that SAW extracts JIMAGE files to after SAW stops execution. (It's not yet clear to me what the best way to do this is, since SAW has various directions the control can flow in, but I'm sure there's a way we could accomplish this with something bracket-like.) The jlink overhead problem, however, is something I don't have an answer for. Ultimately, we don't have much control over jlink's performance, and since the JIMAGE format is internal to the JDK, we don't have many other alternatives for creating them.

RyanGlScott commented 3 years ago

If the bad news from https://github.com/GaloisInc/saw-script/issues/861#issuecomment-762464241 has you frightened, perhaps we should consider some alternatives. Some ideas that come to mind:

  1. Don't bother with jlink at all, and just use jimage extract on the monolithic JIMAGE file shipped with Java that includes all of the standard library. This is still slow—about 1.5 seconds on my machine—but that's still quicker than using jlink beforehand. On the other hand, it would exacerbate the space requirements, as the extracted contents would now take up about 460 megabytyes of space.
  2. Although the JIMAGE file format is internal to the JDK (and subject to change in the future), it turns out that the JDK does include a standalone library called libjimage. (This is surprisingly hard to find, and I only learned about this library's existence through some painfully specific Google searches.) libjimage is to JIMAGE as libzip is to ZIP, and to my surprise, the API that libjimage provides actually includes documentation in the form of comments (although I had to trawl through the OpenJDK source code to find it).

    If my reading of the source code for libjimage is accurate, then you can use a combination of the JIMAGE_FindResource and JIMAGE_GetResource functions to retrieve individual classes from a JIMAGE file. I have no idea how fast that would be, but given that the JDK itself uses this API, it's likely to be reasonably quick.

    If we wanted to, SAW could link against this library and use it to retrieve classes instead of relying on the jimage utility. This, of course, would involve more work (making GHC link against C code is uniquely challenging), but a more fundamental question is: how stable is the libjimage API? Honestly, I have no idea. It's possible that it could change in the future, but one could hope that the API—which is fairly high-level as far as C goes—changes less frequently than the underlying file format. If it's any reassurance, there exists a Rust library that provides bindings to libjimage, so at least one other person thought this was a good idea.

atomb commented 3 years ago

Based on the above, I'm somewhat inclined to skip jlink entirely and depend mostly on jimage. It's somewhat slow, but perhaps not prohibitively so. And there is a flag to limit which classes it exports. For example, this works for me:

jimage extract --include "regex:.*/java/lang/Object.class,regex:.*/java/lang/String.class" $JAVA_HOME/lib/modules

The exact syntax of the string passed to --include doesn't seem to be documented, so I figured out the above by trial and error. At the very least, the above could be used on a class-by-class basis. Extracting a single class on my machine takes about 0.3s, which isn't ideal but perhaps not the end of the world.

One detail of the above is that it requires knowing where the modules file is, just like we need to with rt.jar right now. However, I think that there are some more reliable ways to do that than what we do right now. I wonder if the following will work:

Does that sound reasonable to you? Am I missing anything?

RyanGlScott commented 3 years ago

Ah, I really should have listed your suggestion—use jimage to extract one class at a time—as alternative (3). I suspect that that approach will still be noticeably slower, since even simple Java programs depend on many classes transitively from the standard library, so even 0.3s of overhead per class is likely to add up fast. But I don't yet have exact numbers to back up this claim, so I should try implementing alternative (3) first.

A few refinements to your suggestions:

One detail of the above is that it requires knowing where the modules file is, just like we need to with rt.jar right now. However, I think that there are some more reliable ways to do that than what we do right now.

I think that if we want to do this The Right Way™, we should emulate Java's module path. As the name suggests, the module path is where Java looks for modules (in the form of JIMAGE files), in the exact same way that the classpath is where Java looks for classes (in the form of JAR and .class files). Most Java tools allow specifying the module path with the -p/--module-path flags, and since SAW already tracks its own paths for .class and JAR files via the -c and -j flags, respectively, it seems like a natural addition for SAW to have its own -p flag as well.

If SAW were to gain a -p flag, then we could indicate where the standard modules file is as a command-line argument, much in the same way that we indicate where rt.jar lives as an argument to -j. (Granted, we will also need to indicate to SAW where jimage is located, so we could consider taking advantage of this to automatically detect where the standard modules file lives, but it's not strictly required.) More importantly, this would allow SAW to support other Java libraries that are distributed as JIMAGE files.

  • If we're working with newer JDKs, create an empty temporary directory on startup to store class files in that we delete on exit. Every call to jimage extract will use this directory as the argument to jimage extract --dir=....

If we only ever extract a single class at a time, then I don't think we need to share a single temporary directory across all jimage extract invocations. After all, we only need to extract a class when we load it for the first time, and once we load it, SAW caches the resulting Class file in its CodebaseState. As a result, we can extract a class to a temporary directory, load it into the CodebaseState cache, and delete the temporary directory all in one fell swoop. This avoids the need to ever track temporary directory locations throughout the duration of a single SAW invocation.

  • Our treatment of this new directory will be different than a normal class path entry. Every subdirectory of this directory becomes a class path entry. (And new subdirectories may show up at any point, so we'll need to check every time we look there.)

As mentioned above, we can avoid ever adding the results of jimage extract to the classpath if we simply load one class at a time. There is an interesting question, however, of what order we should look up classes in. Currently, we first looks in the JAR paths (specified by -j) followed by the classpaths (specified by -c). Where would the module paths (specified by -p) fit into this picture? Honestly, I'm not sure, and I couldn't find any documentation on the order in which Java itself does this. (There's this, but I couldn't find a modern version of this that covers Java 9+.) If loading classes from JIMAGE files is the slow path, we might have to consult JIMAGE files last out of sheer necessity.

  • If using jimage once per class is too slow, jdeps -v <classfile or jarfile> will identify all classes that class or JAR depends on once and for all, so we can extract them all at once.

One correction here: if you don't use jlink to combine all of the modules from your module path into a single JIMAGE file, then in general, it's not possible to extract all classes you need in one go. That is, you'd need to invoke jimage extract on every module in your module path in order to be sure that you've extracted all classes. (For most programs, the module path will only consist of the standard modules file, but this is a distinction worth making nonetheless.)

This is basically the same approach that I tried in https://github.com/GaloisInc/saw-script/issues/861#issuecomment-762464241, but without the intermediate jlink step. As a result, there is still the issue of leaving behind a lot of jimage extract files lingering on a temporary directory after SAW has finished execution. This is likely solvable with enough clever engineering, but I wanted to mention this to contrast with alternative (3), which can solve this issue in a more direct way (see my comments above after "[...] create an empty temporary directory on startup [...]").

RyanGlScott commented 3 years ago

Also, thanks for teaching me about jdeps -v—I was under the impression that jdeps only listed module dependencies, but it can list class dependencies too! Does this actually list all of the classes that an application transitively depends on, however? For example, if I run jdeps -v on ecdsa.jar from SAW's examples directory, I get:

``` $ jdeps -v ecdsa.jar ecdsa.jar -> java.base com.galois.ecc.AffinePoint -> com.galois.ecc.ECCProvider ecdsa.jar com.galois.ecc.AffinePoint -> java.lang.Object java.base com.galois.ecc.ECC -> com.galois.ecc.ECCProvider ecdsa.jar com.galois.ecc.ECC -> com.galois.ecc.NISTCurveFactory ecdsa.jar com.galois.ecc.ECC -> com.galois.ecc.PublicKey ecdsa.jar com.galois.ecc.ECC -> com.galois.ecc.Signature ecdsa.jar com.galois.ecc.ECC -> java.io.PrintStream java.base com.galois.ecc.ECC -> java.lang.Integer java.base com.galois.ecc.ECC -> java.lang.Long java.base com.galois.ecc.ECC -> java.lang.Object java.base com.galois.ecc.ECC -> java.lang.String java.base com.galois.ecc.ECC -> java.lang.System java.base com.galois.ecc.ECC -> java.util.Random java.base com.galois.ecc.ECCProvider -> com.galois.ecc.AffinePoint ecdsa.jar com.galois.ecc.ECCProvider -> com.galois.ecc.JacobianPoint ecdsa.jar com.galois.ecc.ECCProvider -> com.galois.ecc.PublicKey ecdsa.jar com.galois.ecc.ECCProvider -> com.galois.ecc.Signature ecdsa.jar com.galois.ecc.ECCProvider -> com.galois.ecc.TwinMulAux2Rslt ecdsa.jar com.galois.ecc.ECCProvider -> java.lang.IllegalArgumentException java.base com.galois.ecc.ECCProvider -> java.lang.NullPointerException java.base com.galois.ecc.ECCProvider -> java.lang.Object java.base com.galois.ecc.ECCProvider -> java.lang.String java.base com.galois.ecc.JacobianPoint -> com.galois.ecc.ECCProvider ecdsa.jar com.galois.ecc.JacobianPoint -> java.lang.Object java.base com.galois.ecc.NIST32 -> com.galois.ecc.AffinePoint ecdsa.jar com.galois.ecc.NIST32 -> com.galois.ecc.ECCProvider ecdsa.jar com.galois.ecc.NIST32 -> java.io.PrintStream java.base com.galois.ecc.NIST32 -> java.lang.String java.base com.galois.ecc.NIST32 -> java.lang.System java.base com.galois.ecc.NIST64 -> com.galois.ecc.AffinePoint ecdsa.jar com.galois.ecc.NIST64 -> com.galois.ecc.ECCProvider ecdsa.jar com.galois.ecc.NISTCurveFactory -> com.galois.ecc.ECCProvider ecdsa.jar com.galois.ecc.NISTCurveFactory -> com.galois.ecc.P384ECC32 ecdsa.jar com.galois.ecc.NISTCurveFactory -> com.galois.ecc.P384ECC64 ecdsa.jar com.galois.ecc.NISTCurveFactory -> java.lang.Object java.base com.galois.ecc.P384Constants -> com.galois.ecc.AffinePoint ecdsa.jar com.galois.ecc.P384Constants -> java.lang.Object java.base com.galois.ecc.P384ECC32 -> com.galois.ecc.AffinePoint ecdsa.jar com.galois.ecc.P384ECC32 -> com.galois.ecc.NIST32 ecdsa.jar com.galois.ecc.P384ECC32 -> com.galois.ecc.P384Constants ecdsa.jar com.galois.ecc.P384ECC64 -> com.galois.ecc.AffinePoint ecdsa.jar com.galois.ecc.P384ECC64 -> com.galois.ecc.NIST64 ecdsa.jar com.galois.ecc.P384ECC64 -> com.galois.ecc.P384Constants ecdsa.jar com.galois.ecc.PublicKey -> java.lang.Object java.base com.galois.ecc.Signature -> java.lang.Object java.base com.galois.ecc.TwinMulAux2Rslt -> java.lang.Object java.base ```

I strongly suspect that this omits many classes that this would transitively depend on (e.g., java.lang.Class).

atomb commented 3 years ago

Yeah, I think I agree with you about the Right Way of adding a notion of module path in addition to the existing class path. One thing I'd really like, however, would be for it to be possible for it to "just work" in many cases without user input. The need to specify the location of rt.jar in the current version tends to be quite painful. Respecting JAVA_HOME and attempting to use java to find the home if it's not set seems like a significant usability improvement.

You make a really good point about not needing to keep .class files on disk if we're loading them on-demand. Making "extract + load + delete" a self-contained step will probably make it less likely that we leave stray class files sitting around.

And you may be right that the output of jdeps -v only includes direct dependencies. It might be possible to then feed those dependencies into jdeps again until we reach a fixpoint, though that could be slow. Maybe worth trying, though? If we can get a list of all dependencies, my initial test indicated that extracting two class files from a JIMAGE file took about the same time as extracting one, so extracting 20, for example, might not take much longer.

One final point is that, although I think it's worth having support for multiple JAR or JIMAGE dependency files, the vast, vast majority of the cases are likely to involve only the single modules file included with the JDK, so optimizing for that case seems to make sense. Although older JDKs included JAR files beyond rt.jar, we've never done a verification that would depend on them, and I expect we never will. Given that the modules JIMAGE file contains even more stuff, I'd say that probably will be even more true for JDK versions >=9.

My inclination for search order would be to first search the module path, then JAR files, then classpath directories, though I admit I haven't given it that much thought.

RyanGlScott commented 3 years ago

One thing I'd really like, however, would be for it to be possible for it to "just work" in many cases without user input. The need to specify the location of rt.jar in the current version tends to be quite painful. Respecting JAVA_HOME and attempting to use java to find the home if it's not set seems like a significant usability improvement.

I agree. It may be worth adding JAVA_HOME support independently of the effort to support modern Java, since I think that's mostly a matter of setting up some environment variable–related circuitry in the right places. Once the circuitry is installed for SAW's classpath, wiring it up for the module path should be straightforward.

And you may be right that the output of jdeps -v only includes direct dependencies. It might be possible to then feed those dependencies into jdeps again until we reach a fixpoint, though that could be slow. Maybe worth trying, though? If we can get a list of all dependencies, my initial test indicated that extracting two class files from a JIMAGE file took about the same time as extracting one, so extracting 20, for example, might not take much longer.

Iterating jdeps -v to a fixpoint might be trickier than it sounds. As an experiment, I added an extra putStrLn in SAW to indicate whenever a class is loaded for the first time. With this, I was able to see how many classes end up getting loaded when running the relatively small ffs_java.saw example:

[17:45:53.659] Loading file "/home/rscott/Documents/Hacking/Haskell/saw-script/doc/tutorial/code/ffs_java.saw"
[17:45:53.659] Extracting reference term
RGS tryLookupClass: Loading FFS
extracting ffs_ref
RGS tryLookupClass: Loading java/lang/Class
RGS tryLookupClass: Loading java/lang/String
RGS tryLookupClass: Loading java/io/BufferedOutputStream
RGS tryLookupClass: Loading java/io/FilterOutputStream
RGS tryLookupClass: Loading java/io/OutputStream
RGS tryLookupClass: Loading java/io/PrintStream
RGS tryLookupClass: Loading java/lang/Object
RGS tryLookupClass: Loading java/lang/System
RGS tryLookupClass: Loading java/lang/StringIndexOutOfBoundsException
RGS tryLookupClass: Loading java/lang/Exception
[17:45:53.705] Extracting implementation term
extracting ffs_imp
[17:45:53.711] Proving equivalence
[17:45:53.760] Done.

On the other hand, jdeps -v is very stubborn and only reports one of these classes:

$ jdeps -v FFS.class 
FFS.class -> java.base
   FFS                                                -> java.lang.Object                                   java.base

I hunted around through jdeps' command-line options, but I couldn't find anything that made it print more information than that. The question now becomes: how do we feed java.lang.Object back into jdeps? The only way I can think of would be to jimage extract it out, point jdeps at the location where it was extracted, and repeat. But at that point, we're basically extracting classes more-or-less one at a time—or at the very least, at a much finer granularity than "all at once".

One final point is that, although I think it's worth having support for multiple JAR or JIMAGE dependency files, the vast, vast majority of the cases are likely to involve only the single modules file included with the JDK, so optimizing for that case seems to make sense. Although older JDKs included JAR files beyond rt.jar, we've never done a verification that would depend on them, and I expect we never will. Given that the modules JIMAGE file contains even more stuff, I'd say that probably will be even more true for JDK versions >=9.

Even if we never do end up using a JAR or JIMAGE file beyond the ones shipped with Java itself, I'll sleep a little easier at night knowing that we could do so in the future if the need arose :)

RyanGlScott commented 3 years ago

I spoke with @atomb about this recently. Here are some takeaways:

  1. The libjimage approach, while tempting, is a non-starter due to the fact that it would require linking against C++ code. I'm still curious about the idea of investigating precisely what goes wrong (and if there are perhaps GHC bugs in our way), so I've opened GaloisInc/crucible#621 for this task.
  2. If libjimage is out of the question, then we don't have many alternatives besides using the utilities that the JDK ships with. Unfortunately, just about every combination of jdeps/jlink/jimage that I've tried imposes some non-trivial performance overhead. However, that doesn't necessarily mean that using these tools is infeasible—we would just need to figure out a way to amortize the cost of using these tools across multiple SAW invocations.

How, indeed, should we amortize the cost? The most expensive part of using jdeps/jlink/jimage is the initial creation/extraction of the .class files that SAW needs. Once that is done, however, subsequent invocations of SAW could, in theory, re-use the resulting .class files without needing to go through jdeps/jlink/jimage again. This means that the first time you invoke SAW with a newly encountered JIMAGE file will be slow, but invoking SAW afterwards with the same JIMAGE file should reasonably fast.

Some questions:

RyanGlScott commented 3 years ago

One correction to my musings above: the module path actually isn't where one specifies the locations of JIMAGE files. Rather, it's the location where Java searches for JARs and .class files for modularized code. In fact, I'm not exactly sure how you specify the location of JIMAGE files to Java! The answer may very well be "you don't", as the Java developers likely intended for jlink to completely hide away the details of JIMAGE files from end users.

This is arguably a good thing from SAW's point of view, as it means that we can avoid having a separate command-line flag for JIMAGE files at all. Instead, you point SAW at the directory where java lives and it will take care of the rest. This does mean that #1030 will now be a requirement in order to make this work, rather than just a quality-of-life improvement that makes it easier to invoke SAW.

RyanGlScott commented 3 years ago

I've implemented the caching idea from https://github.com/GaloisInc/saw-script/issues/861#issuecomment-763696924 in #1046, operating at the granularity of extracting one class at a time from modules. This works well enough for the examples I've tried it on, and once the classes are cached, running SAW with JDK 11 is just as fast as with JDK 8.

The PR isn't ready to be merged yet, as there are some tasks that it is blocked on (#993 and #1030), but I'd welcome any feedback you could provide.

RyanGlScott commented 3 years ago

See GaloisInc/crucible#638 for the crucible-jvm–related changes.

RyanGlScott commented 3 years ago

Once again, I have good news and bad news.

The good news is that I've factored out basically all of the necessary functionality for reading from JIMAGE files into crucible-jvm as part of GaloisInc/crucible#638. This is nice because the crucible-jvm executable in crucible (which perhaps ought to be called crux-jvm instead) can also benefit from caching JIMAGE files—in fact, crucible-jvm and saw can quite literally share the same cache on the same machine. Moreover, I've demonstrated in #1046 that the SAW test suite passes with JDK 15, the latest release at the time of writing.

The bad news is that despite all this, GaloisInc/crucible#638 is actually quite broken still. I only realized this after trying to run the crucible-jvm test suite, which consists of a single .java file that imports String. As it turns out, crucible-jvm chokes really badly on String with modern JDKs. What follows is a tale of woe:


In light of this, I'm forced to admit that there's still quite a bit of investigation that needs to happen to diagnose why crucible-jvm topples over here. Unfortunately, I don't know if I'll have the time to do so myself, which raises a question of what to do with the current states of GaloisInc/crucible#638 and #1046.

Here is one proposal: given that this works well enough with the sorts of Java programs that SAW includes in its test suite, we could merge GaloisInc/crucible#638 and #1046 now, but explicitly label support for JDK 9+ as experimental in the SAW/Crucible documentation. (We may even consider printing a warning when a user launches these tools with JDK 9+.) We could then open a separate issue in the crucible repo to track the remaining issues, including the ones described above.

Does this sound reasonable? Or do you think we should hold off until more effort has been spent debugging these issues?

atomb commented 3 years ago

Thanks for the detailed analysis!

My inclination would be to go ahead and merge this (with appropriate warnings about JDK 9+). This is at least a strict improvement over what's in there right now. Most of the sort of verification we normally do with SAW will then work with newer JDKs, but stuff using more of the standard library still won't. That's better than not being able to do anything with modern JDKs.

RyanGlScott commented 3 years ago

In that case, I've:

RyanGlScott commented 3 years ago

For the sake of posterity, it's worth recording that my earlier claim that the test suite passes with JDK 15 is not quite true. The test_crucible_jvm test case fails thusly:

[16:18:15.313] Loading file "/home/rscott/Documents/Hacking/Haskell/saw-script/intTests/test_crucible_jvm/teststr_crucible.saw"
[16:18:15.314] **loading TestStr
[16:18:15.326] **Extracting methods
extracting main
[16:18:15.365] resolveField: Field FieldId {fieldIdClass = ClassName "java/lang/Object", fieldIdName = "value", fieldIdType = char[]} not found
CallStack (from HasCallStack):
  error, called at src/Lang/Crucible/JVM/Translation/Monad.hs:61:15 in crucible-jvm-0.2-inplace:Lang.Crucible.JVM.Translation.Monad
  jvmFail, called at src/Lang/Crucible/JVM/Translation/Class.hs:836:23 in crucible-jvm-0.2-inplace:Lang.Crucible.JVM.Translation.Class