Closed tsdh closed 9 years ago
We think that adding a warmup phase is still future work but it's pretty complicated to do properly. Some thoughts:
I still wasn't able to reproduce the deadlock issue on Ubuntu -- I'll look into the it on the Arch VM on SHARE.
Maybe complicated to do exactly correct but anything is better than spinning off an new JVM and start measurement on completely cold byte-code right away.
Concerning jmh, nope, I've never used it so far.
I don't understand how the proportion of elements to be fixed in one iteration has anything to do with the topic of the issue. Can you please explain what you had in mind?
I don't understand how the proportion of elements to be fixed in one iteration has anything to do with the topic of the issue. Can you please explain what you had in mind? What I had in mind is: take a large model, run 100 iterations each modifying 1 element. This should take long enough for the JVM to warm up.
However, I've been thinking and came up with a better idea: instead of running a query-changeset-size benchmark n times from the Python script, pass n as an argument to the JVM and let the Java framework run the benchmark n times (and throw away the first k runs).
Another advantage of this approach is that it does not cause problems for existing Java implementations and only minor hassle for other implementations (e.g. .NET-based ones).
Yes, that's fine, too.
Okay, I implemented this. Just ran some quick tests which demonstrated the warmup effect, e.g. the check phase for the tool EMF-IncQuery, query SwitchSensor, change set fixed, size 128 benchmark resulted in the following times (ms):
Great, thanks!
@szarnyasg I have two things to note with this change:
I've tried executing all queries with java, eiq, and funnyqt (in that order) with all models up to the size of 8192
and three runs each. At the third run of the eiq SemaphoreNeighbor
query it died with a OutOfMemoryError
. There are two things to note:
(a) The remaining tests were not executed anymore, i.e., the funnyqt solution didn't get started. I would expect that when tool X crashes with query Y, either tool X gets executed with query Y+1, or if Y was the last query, then tool X+1 should have its turn.
(b) Since eiq died with the third run of SemaphoreNeighbor
but the first and second run succeeded, chances are high that there's some memory leak somewhere. Maybe you keep references to the models of the previous runs somewhere or something like that?
PING?
I think the 300 sec timeout used to be per run of a given task/size, and is now cumulative for all runs. So more runs = less time to complete each run.
@dwagelaar Thanks for confirming.
Hi, I'll get back to this on Saturday. On Apr 30, 2015 10:04 PM, "Tassilo Horn" notifications@github.com wrote:
@dwagelaar https://github.com/dwagelaar Thanks for confirming.
— Reply to this email directly or view it on GitHub https://github.com/FTSRG/trainbenchmark-ttc/issues/4#issuecomment-97951772 .
@szarnyasg No! No need to ruin your weekend. I just wanted to make sure this hasn't been forgotten.
@szarnyasg I have two things to note with this change:
- Is it possible that now the 300 sec timeout applies to executing a test to all models in the min/max interval? At least I've had timeouts now for models that have been transformed fine previously.
I increased the timeout to 600 seconds. Taken the effect of the warmup into account, I think 10 minutes should be enough for 5 runs.
- I've tried executing all queries with java, eiq, and funnyqt (in that order) with all models up to the size of
8192
and three runs each. At the third run of the eiqSemaphoreNeighbor
query it died with aOutOfMemoryError
. There are two things to note:
I was not able to reproduce this error. Which size/Xmx setting did you use? Can you please provide some more details?
(a) The remaining tests were not executed anymore, i.e., the funnyqt solution didn't get started. I would expect that when tool X crashes with query Y, either tool X gets executed with query Y+1, or if Y was the last query, then tool X+1 should have its turn.
Thanks, I fixed this.
(b) Since eiq died with the third run of
SemaphoreNeighbor
but the first and second run succeeded, chances are high that there's some memory leak somewhere. Maybe you keep references to the models of the previous runs somewhere or something like that?
As I mentioned before, I was not able to reproduce the error, however, I implemented the destroy()
method to call the dispose()
method.
@szarnyasg I'll try re-running the benchmarks and report back.
@szarnyasg Seems to work now.
But I've noticed one other bug. I've run all queries with the java, eiq and funnyqt solutions on all models in the range from 1 to 8192 with two runs each. When generating the diagrams using scripts/run -v
afterwards, the X-scale in the diagrams is missing the 1024 entry but the 512 entry is there twice (with different y-values).
Thanks, this was a bug in the R script. Fixed in https://github.com/FTSRG/trainbenchmark-ttc/commit/2fa9879658481adf9034744f2541161431abf9ba.
When I run my solution in its standalone project, it runs much faster than when it's run by this benchmark framework. I'm not sure what's causing this but one culprit could be that the benchmark framework runs every single test in its own new JVM instance and thus it's possible that the benchmark results are forged by just-in-time compilation. For that reason, I'd suggest to add some warmup phase, e.g., performing the selected rule to the smallest model 10 times before measuring the concrete execution time with the current selected model.
I suspect this is not the only factor, though. I've never seen the reference clearing queue deadlock the README also speaks of now before. It doesn't happen with any java version/variant (7 or 8, Oracle or OpenJDK) for my standalone solution (or any other project I've worked on), but I get it pretty consistently with the benchmark project as soon as I use something different than the Oracle JDK in version 7. And the hangs/deadlocks don't occur only when running my solution but at least the java solution causes them, too.