Add some warmup phase - Githubissues

tsdh commented 9 years ago

When I run my solution in its standalone project, it runs much faster than when it's run by this benchmark framework. I'm not sure what's causing this but one culprit could be that the benchmark framework runs every single test in its own new JVM instance and thus it's possible that the benchmark results are forged by just-in-time compilation. For that reason, I'd suggest to add some warmup phase, e.g., performing the selected rule to the smallest model 10 times before measuring the concrete execution time with the current selected model.

I suspect this is not the only factor, though. I've never seen the reference clearing queue deadlock the README also speaks of now before. It doesn't happen with any java version/variant (7 or 8, Oracle or OpenJDK) for my standalone solution (or any other project I've worked on), but I get it pretty consistently with the benchmark project as soon as I use something different than the Oracle JDK in version 7. And the hangs/deadlocks don't occur only when running my solution but at least the java solution causes them, too.

szarnyasg commented 9 years ago

We think that adding a warmup phase is still future work but it's pretty complicated to do properly. Some thoughts:

I believe the JIT compiler takes a pretty long time to kick-in -- the "method invocation threshold" is 1,500 for the client setting and 10,000 for the server settings (see http://www.javaworld.com/article/2077337/build-ci-sdlc/watch-your-hotspot-compiler-go.html and http://www.oracle.com/technetwork/java/javase/vmoptions-jsp-140102.html).
We think jmh could be useful for running the benchmark but haven't used it yet. Have you experimented with this?
If we run 1. a lot of iterations 2. with small transformations 3. on large models, the JIT compiler will kick in. I added a command line argument for specifying the transformationConstant: (https://github.com/FTSRG/trainbenchmark-ttc/commit/adce78420913b25110c5be21b2045683299e04e3). This gives the user a chance to define small transformations (2.).

I still wasn't able to reproduce the deadlock issue on Ubuntu -- I'll look into the it on the Arch VM on SHARE.

tsdh commented 9 years ago

Maybe complicated to do exactly correct but anything is better than spinning off an new JVM and start measurement on completely cold byte-code right away.

Concerning jmh, nope, I've never used it so far.

I don't understand how the proportion of elements to be fixed in one iteration has anything to do with the topic of the issue. Can you please explain what you had in mind?

szarnyasg commented 9 years ago

I don't understand how the proportion of elements to be fixed in one iteration has anything to do with the topic of the issue. Can you please explain what you had in mind? What I had in mind is: take a large model, run 100 iterations each modifying 1 element. This should take long enough for the JVM to warm up.

However, I've been thinking and came up with a better idea: instead of running a query-changeset-size benchmark n times from the Python script, pass n as an argument to the JVM and let the Java framework run the benchmark n times (and throw away the first k runs).

Another advantage of this approach is that it does not cause problems for existing Java implementations and only minor hassle for other implementations (e.g. .NET-based ones).

tsdh commented 9 years ago

Yes, that's fine, too.

szarnyasg commented 9 years ago

Okay, I implemented this. Just ran some quick tests which demonstrated the warmup effect, e.g. the check phase for the tool EMF-IncQuery, query SwitchSensor, change set fixed, size 128 benchmark resulted in the following times (ms):

20994
2232
1754
2183
1834

tsdh commented 9 years ago

Great, thanks!

tsdh commented 9 years ago

@szarnyasg I have two things to note with this change:

Is it possible that now the 300 sec timeout applies to executing a test to all models in the min/max interval? At least I've had timeouts now for models that have been transformed fine previously.
I've tried executing all queries with java, eiq, and funnyqt (in that order) with all models up to the size of 8192 and three runs each. At the third run of the eiq SemaphoreNeighbor query it died with a OutOfMemoryError. There are two things to note:

(a) The remaining tests were not executed anymore, i.e., the funnyqt solution didn't get started. I would expect that when tool X crashes with query Y, either tool X gets executed with query Y+1, or if Y was the last query, then tool X+1 should have its turn.

(b) Since eiq died with the third run of SemaphoreNeighbor but the first and second run succeeded, chances are high that there's some memory leak somewhere. Maybe you keep references to the models of the previous runs somewhere or something like that?

tsdh commented 9 years ago

PING?

dwagelaar commented 9 years ago

I think the 300 sec timeout used to be per run of a given task/size, and is now cumulative for all runs. So more runs = less time to complete each run.

tsdh commented 9 years ago

@dwagelaar Thanks for confirming.

szarnyasg commented 9 years ago

Hi, I'll get back to this on Saturday. On Apr 30, 2015 10:04 PM, "Tassilo Horn" notifications@github.com wrote:

@dwagelaar https://github.com/dwagelaar Thanks for confirming.

— Reply to this email directly or view it on GitHub https://github.com/FTSRG/trainbenchmark-ttc/issues/4#issuecomment-97951772 .

tsdh commented 9 years ago

@szarnyasg No! No need to ruin your weekend. I just wanted to make sure this hasn't been forgotten.

szarnyasg commented 9 years ago

@szarnyasg I have two things to note with this change:

Is it possible that now the 300 sec timeout applies to executing a test to all models in the min/max interval? At least I've had timeouts now for models that have been transformed fine previously.

I increased the timeout to 600 seconds. Taken the effect of the warmup into account, I think 10 minutes should be enough for 5 runs.

I've tried executing all queries with java, eiq, and funnyqt (in that order) with all models up to the size of 8192 and three runs each. At the third run of the eiq SemaphoreNeighbor query it died with a OutOfMemoryError. There are two things to note:

I was not able to reproduce this error. Which size/Xmx setting did you use? Can you please provide some more details?

(a) The remaining tests were not executed anymore, i.e., the funnyqt solution didn't get started. I would expect that when tool X crashes with query Y, either tool X gets executed with query Y+1, or if Y was the last query, then tool X+1 should have its turn.

Thanks, I fixed this.

(b) Since eiq died with the third run of SemaphoreNeighbor but the first and second run succeeded, chances are high that there's some memory leak somewhere. Maybe you keep references to the models of the previous runs somewhere or something like that?

As I mentioned before, I was not able to reproduce the error, however, I implemented the destroy() method to call the dispose() method.

tsdh commented 9 years ago

@szarnyasg I'll try re-running the benchmarks and report back.

tsdh commented 9 years ago

@szarnyasg Seems to work now.

But I've noticed one other bug. I've run all queries with the java, eiq and funnyqt solutions on all models in the range from 1 to 8192 with two runs each. When generating the diagrams using scripts/run -v afterwards, the X-scale in the diagrams is missing the 1024 entry but the 512 entry is there twice (with different y-values).

szarnyasg commented 9 years ago

Thanks, this was a bug in the R script. Fixed in https://github.com/FTSRG/trainbenchmark-ttc/commit/2fa9879658481adf9034744f2541161431abf9ba.

ftsrg / trainbenchmark-ttc

Add some warmup phase #4