Optimize JVM options to minimize memory use, etc.

alda-lang / alda

A music programming language for musicians. :notes:

https://alda.io

Eclipse Public License 2.0

5.62k stars 291 forks source link

Optimize JVM options to minimize memory use, etc. #269

Closed daveyarwood closed 8 years ago

daveyarwood commented 8 years ago

Because the server and workers are all separate processes with their own JVMs, it would be good if we took steps to make sure that the processes are as lightweight as possible while still having the resources they need.

Currently we are not using any JVM options, except for Clojure direct linking. We could add in some options to set a cap on memory use, heap space, etc.

Alda's JVM options are set here in build.boot. These options are used in the executable when it is built. They are shared between the client, worker and server processes, since they share the same jar file / executable.

I could use some help with this -- I don't have a great deal of experience with tuning JVM applications.

feldoh commented 8 years ago

I'm up for having a go. Given that this is a music app I'm assuming you want to prioritise minimal pause times as GC in the middle of a melody isn't ideal. Also in terms of heap size I'm assuming it's going to be relatively small but I can't choose a collector appropriately or indeed heap limits without a fairly worst case sample. Can you give an example of a particularly heavy workload. I'm guessing either CMS or G1 will be best (although if you really want low resource usage the throughput collector would actually be best, but I'd never advise that for an interactive system) I'd prefer to benchmark a worst case rather than just guess though ;) One thing to consider is to reduce pauses it is generally advised to assign minimum to your estimated maximum to avoid heap resizing which is an expensive operation but that is in direct opposition to keeping the resource usage down. Especially if you consider something like using the cli options to play a single note ending up taking 4G or something silly. Given that I'd be tempted to start low and take the resizing hits. So given the wide load variety I'm not sure which situation you want to optimise for. Give me some pointers as to your ideal trade-offs and I'll have a crack.

daveyarwood commented 8 years ago

Awesome, thanks for offering to help! I'll be curious to see what you find is most resource-efficient.

I also feel like starting low might yield the best results. Basically, I think right now, each worker process is probably given more resources than it needs, so it would be interesting to see what are the lowest caps we can set without performance being negatively impacted. It seems like an Alda worker process should be able to be pretty lightweight, so we could allow the user to scale up "horizontally" by adding more workers.

Signs of performance impact to watch for would be things like:

audio stuttering or cutting out
notes delayed / rhythms getting skewed
a score otherwise "not sounding like it should"
lack of immediate response from server/workers

I think a good way to stress test this might be to have multiple workers play a large score. The largest example we have in the repo is Bach's Cello Suite No. 1, so perhaps you could test by starting a server with 2-4 workers, and tell each one to play that score one after the other:

# assuming you've built Alda locally to /tmp/alda
$ /tmp/alda --workers 4 up
$ /tmp/alda play -f examples/bach_cello_suite_no_1.alda
$ /tmp/alda play -f examples/bach_cello_suite_no_1.alda
$ /tmp/alda play -f examples/bach_cello_suite_no_1.alda
$ /tmp/alda play -f examples/bach_cello_suite_no_1.alda

feldoh commented 8 years ago

How do you actually build this project locally. I've never actually dealt with a boot project before. I looked though the list of tasks and tried package which builds a jar but the jar is not valid for simple java -jar (i.e. the Manifest file refers to alda.Main which isn't in the jar (I assume clojure should generate it)

18:17 $ java -Dclojure.compiler.direct-linking=true -jar target/alda.jar --workers 4 up
Error: Could not find or load main class alda.Main

If I try the build task on unix (Centos) it fails because it seems to use the same directory twice:

18:20 $ boot build -o /tmp
Compiling 12 Java source files...
Fatal Error: Unable to find package java.lang in classpath or bootclasspath
Writing pom.xml and pom.properties...
Adding uberjar entries...
Writing alda.jar...
Creating alda binary...
Writing /tmp/alda...
              clojure.lang.ExceptionInfo: java.util.concurrent.ExecutionException: java.nio.file.DirectoryNotEmptyException: target/alda
    data: {:file "/tmp/boot.user2212806555876298310.clj", :line 31}
 java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.nio.file.DirectoryNotEmptyException: target/alda
 java.util.concurrent.ExecutionException: java.nio.file.DirectoryNotEmptyException: target/alda
java.nio.file.DirectoryNotEmptyException: target/alda

That is after clearing the target directory one line earlier. i.e. rm -rf target. I get npe's if I try using a pre-built jar:

18:24 $ boot build -o /tmp -f $(pwd)
         clojure.lang.ExceptionInfo: java.lang.NullPointerException
data: {:file "/tmp/boot.user2626859823623853740.clj", :line 31}
java.util.concurrent.ExecutionException: java.lang.NullPointerException

Running from the repl doesn't seem to actually run; just return an object:

boot.user=> (build "-o" "/tmp")
#object[clojure.core$comp$fn__4727 0x66c67ec0 "clojure.core$comp$fn__4727@66c67ec0"

. On Windows it failed trying to make a unix binary. Any pointers?

daveyarwood commented 8 years ago

I apologize for the confusion! It seems like there are a few different issues going on here...

The Alda .jar file is interesting because it contains both the Java (client) and Clojure (server, workers) source code. The entry point to the app is alda.Main, which is Main.java. Then we call the Clojure code from Java.

For our Unix releases, we take this jar file and thinly wrap it in an executable that is really just a script for running java -jar with the JVM options specified in build.boot.
The boot build command will build executables (Unix and Windows) wrapping the jar file. Running boot build -o /tmp followed by /tmp/alda --workers 4 up should be equivalent to creating the jar file and running it with java -jar, although of course you lose the ability to set your own JVM options at the command line.
If you look at the definition of the package task in build.boot:
```
(deftask package
"Builds an uberjar."
[]
(comp (assert-jdk7-bootclasspath)
      (javac)
      (pom)
      (uber)
      (jar)))
```
This is the composition of 5 subtasks. We can leave out the assert-jdk7-bootclasspath task for now, for the sake of example. Running a task like this in boot (i.e. boot package) is equivalent to running each subtask individually: boot javac pom uber jar.

So, if you run boot javac pom uber jar, that will create the jar file. It will not write it to the target directory by default, though, so we need to add the target task at the end: boot javac pom uber jar target:
```
$ boot javac pom uber jar target
Compiling 12 Java source files...
Writing pom.xml and pom.properties...
Adding uberjar entries...
Writing alda.jar...
Writing target dir(s)...

$ java -jar target/alda.jar --workers 4 up
```
Another way you could do this would be to add a (target) subtask to the build task in build.boot, right before the bin and exe subtasks. The way we have it set up right now is it only outputs the executables, skipping writing the uberjar contents and jar file to the target directory, since we only release the executables.
Windows: I'm not sure why it would fail trying to create the Unix binary -- it's basically just writing a file with a small header and then the contents of the jar file. Was there an error message?

daveyarwood commented 8 years ago

Hmm... actually, looking at this closer, I'd say this is the bigger issue you're seeing:

18:20 $ boot build -o /tmp
Compiling 12 Java source files...
Fatal Error: Unable to find package java.lang in classpath or bootclasspath

For some reason, the javac step is unable to compile the Java source code because of something about your classpath/ bootclasspath. That's something I haven't seen before.

This would explain why it was unable to find Main.java:

18:17 $ java -Dclojure.compiler.direct-linking=true -jar target/alda.jar --workers 4 up
Error: Could not find or load main class alda.Main

daveyarwood commented 8 years ago

I'll bet this has something to do with our javac options in build.boot: https://github.com/alda-lang/alda/blob/master/build.boot#L59-L65

(task-options!
  javac   {:options (concat
                      ["-source" "1.7"
                       "-target" "1.7"]
                      (when-let [jdk7-bootclasspath
                                 (System/getenv "JDK7_BOOTCLASSPATH")]
                        ["-bootclasspath" jdk7-bootclasspath]))}

For compatibility with systems that have Java 7 installed, we have this in place to make sure that you can't compile the Java source code unless you have a JDK7 bootclasspath specified by the JDK7_BOOTCLASSPATH environment variable. Then we pass that path as the -bootclasspath option.

Ideally, what you should do is make sure you have JDK7 installed and JDK7_BOOTCLASSPATH is set to the correct bootclasspath for JDK7. With that in place, boot javac should work, and so should boot build.

If that's too much of a hassle, I think it would be fine to temporarily comment out the javac options:

(task-options!
;  javac   {:options (concat
;                      ["-source" "1.7"
;                       "-target" "1.7"]
;                      (when-let [jdk7-bootclasspath
;                                 (System/getenv "JDK7_BOOTCLASSPATH")]
;                        ["-bootclasspath" jdk7-bootclasspath]))}

feldoh commented 8 years ago

Interesting; I had missed that CP error. I'd assumed if it proceeded everything was fine. I did export JDK7_BOOTCLASSPATH though I set it to a JDK8 folder because that was a controlled machine that won't let me install anything except the newest version of Java. I didn't set it directly to rt.jar though. So now it builds and I have some workers woo :)

daveyarwood commented 8 years ago

Sweet!

feldoh commented 8 years ago

I've done some initial checks and found that at present the processes starts with around 700M memory. The actual reserved amount for minimum is around 250M and maximum 4G. This will vary by Java version and environment. Around 300-600M seems to be the amount used per process in my environment. So I could slim it down a bit. This was with somewhat impractical settings just aiming to scavenge as much memory as humanly possible. As it stands GC is relatively infrequent and turning on the GC logs shows that in real terms GC is lasting 0.01s so over-tuning would be a mistake. It seems that the GC is never really being strained so GC tuning will be of limited benefit.

As I see it we have 2 options. I can either just put in some basics lets say 300M starting 1G max and leave it at that for now. Lets be honest with large processes that's your bottleneck more than anything else. I'd say most people wouldn't be able to exceed 20 without memory issues. Or I can go slightly further and add some basic GC options just to keep it consistent more than anything else. I'd advise one of:

a) Explicitly select the concurrent collector (Default pre-Java 9) to try and reduce pause times with the extra flag to let the app have more interleaving. I don't really want to go further than that with CMS tuning because gc is not a major bottleneck (at least not in my tests). Also given that master and slave are somewhat different in terms of load overtuning one would likely damage the other.

b) Go for the garbage first collector (usable as of Java 7). This one is meant to replace the concurrent collector in time. It's mostly intended for larger heaps or apps which make a lot of garbage which immutable things like Clojure tend to do. It also has the rather nice feature of a max pause time goal. So we can set that to say 1/10th of a second which would hopefully not be noticeable and leave the rest to it as it is meant to self tune to hit that goal. I found G1 to have the shortest pauses and the least resting CPU load for alda processes. Turning on continuous collection saved some CPU (rather surprisingly) but made no real difference to pause times.

In general though I'd say don't go to heavy on the GC tuning because GC isn't the bottleneck. It's not spending much time in GC even at default settings. Tell me what you're thinking and I'll make a pr.

daveyarwood commented 8 years ago

Wow, thanks for all this info -- I'm learning a lot!

I agree that we probably don't need to do much about the GC, but given that the Alda processes are meant to be left running in the background, the idea of decreasing resting CPU load is appealing to me. I'm interested in trying the G1 collector.

daveyarwood commented 8 years ago

Closed by #276.