Open andre15silva opened 3 years ago
Hi @andre15silva
Nice issue to tackle. The problematic code to be refactored should be here: https://github.com/SpoonLabs/flacoco/blob/master/src/main/java/fr/spoonlabs/flacoco/core/test/TestDetector.java
Hi @martinezmatias
Yes.
Based on pure observation tho, I'd say the bottleneck is larger on test-runner
's side.
I'll try to confirm this with some profiling.
Hot spots of the flacoco
process running on math_70
from astor
's examples:
Self times
Total times
Optimizations that jump to mind:
SpoonTestMethod
's fields can be computed just once, so we don't need to call the model getters all the time. (yields ~50% cpu time reduction in the flacoco
process)
Loading the serialized file produced by the test-runner
process itself takes ~10% of the flacoco
CPU time after optimization 1.
Test detection takes ~80% of the cpu time of the flacoco
process after optimization 1. (70s out of 90s in my machine).
Hot spots of the test-runner
process running on the same example as before:
Self times
Expanded self times for the top result. This highlights that the most time consuming operation is analyzing the instrumented classes after execution.
Total times
Where to optimize?
Update here, I think implementing a org.jacoco.core.internal.analysis.ClassAnalyzer
that takes in several ExecutionDataStore
's is the best way.
Currently we do a full analysis of the binaries each time a method finishes. What we want to do is store the ExecutionDataStore
's and do just one pass through the binaries at the end of the test run.
Doing this analysis is costly, and so much more when you do it several thousands of times.
Edit:
As far as I have come to understand, doing this in a single thread with just one pass of the class files requires an almost entirely new analyzer package of jacoco
. WIP on this front.
Using jacoco
, I think the best we can do is https://github.com/STAMP-project/test-runner/pull/112
The only way I see we could achieve a performance similar to GZoltar is to either adapt jacoco so that the analyzer takes several ExecutionDataStore
s at once, and does just one sweep (not sure how feasible that is, nor if it will even reach GZoltar's performance). I'm experimenting on that, but still soon to tell.
Hi @andre15silva
Thanks for the update.
Hi @martinezmatias,
Another update. I've come to realize that the problem might be that jacoco
analyzes every single class, not just the ones executed. GZoltar only analyzes the ones that were executed.
It is the difference between being quadratic and "quasi-linear". I'm working on having a PR for jacoco
that would allow for this.
I have opened a PR https://github.com/jacoco/jacoco/pull/1212, that introduces an option for skipping non-executed classes in the analysis.
This does improve performance, but we are still a bit far from GZoltar.
The next important steps are:
test-runner
. Right now, on my PC on math70
, we spend more than 20/25 seconds just saving and loading, out of 1m30s.Test detection might be feasible by using surefire.
https://github.com/apache/maven-surefire/tree/surefire-3.0.0-M5_vote-1/surefire-providers https://maven.apache.org/surefire/maven-surefire-plugin/api.html
Hi @martinezmatias
I have opened a PR jacoco/jacoco#1212, that introduces an option for skipping non-executed classes in the analysis.
That's a great contribution.
Optimizing test detection. GZoltar, as far as I know and according to the implementation in Astor, doesn't have this step. Still, it is one of the most time consuming steps of our pipeline, so we should aim at optimizing it.
Optimizing test detection. GZoltar, as far as I know and according to the implementation in Astor, doesn't have this step
Do you mean to a) retrieve the list with the names of the test methods/classes to execute or b) the test framework from each test?
Removing serialization and de-serialization from test-runner. Right now, on my PC on math70, we spend more than 20/25 seconds just saving and loading, out of 1m30s.
Agree, it's a big portion.
Hi @martinezmatias
Do you mean to a) retrieve the list with the names of the test methods/classes to execute or b) the test framework from each test?
Both. Building the spoon model itself already takes around half the time, while checking the frameworks the other half.
See the graph:
test-runner
inter-process communication optimization PR opened in https://github.com/STAMP-project/test-runner/pull/116
Running flacoco on math70 with this optimization reduces writing/reading time from ~25s to ~3s (~88% reduction).
Hi @andre15silva
Running flacoco on math70 with this optimization reduces writing/reading time from ~25s to ~3s (~88% reduction).
Nice work!.
Running flacoco on math70 with this optimization reduces writing/reading time from ~25s to ~3s (~88% reduction).
Impressive!
Running flacoco on math70
with #82 reduces test detection time from ~60s to ~1s (~99% reduction), as well as fixing #80.
On the basis of pure observation, flacoco is considerably slower than GZoltar right now.
Some ideas for optimization:
I will also do some profiling and update this issue