Open sasa1977 opened 7 years ago
Haha, I thought this was @sebastian posting, because you reproduced almost verbatim what I talked about with him some time ago on slack :D
One thing that I think would be very useful is graphing/tracking these numbers over time on master. That way you don't need to worry about performance with every change, but you can track a performance degradation back to a certain change when needed.
Yeah, graphing would be very cool, and also helpful to eliminate false-negatives (an accidental perf degradation due to some non-related "stuff" happening on a test machine).
Excellent suggestions.
Do we really need the ability to compare multiple versions with a single call? It sounds like that is going to add a boatload of complexity? How would you actually want to solve it – checkout and rebuild in the background (without affecting the test running machinery)? Although maybe it could be achieved through pointing the test at another folder on disk containing the version against which it should be compared?
All the same, it would be useful to have the ability to only run the test against the current version as well. If you are comparing against some baseline numbers, then these are likely to change while you make tweaks to some experimental code. Running repeated tests on the base line version would therefore be a waste?
Memory readings could be collected from inside BEAM? That would then discount the memory used by other system processes, which in fact seems desirable.
Also I am not quite clear on where you would want to collect the memory stats for historical analysis? You mean we check in past runtimes to the repository so we have historical comparison data to check against? Or that we store times in some external system for fine grained historic graphs? (Both?)
Do we really need the ability to compare multiple versions with a single call?
It's true that if we run tests once on some commit, we don't need to repeat that run anymore, as long as the results are saved. However, with repeated runs, we can expand our tests and get additional info for the past version(s). Meaning, I can add some extra queries in the future, and then get perf difference between the current and the past version. Perhaps, we could do that in the phase 2 though.
How would you actually want to solve it – checkout and rebuild in the background (without affecting the test running machinery)? Although maybe it could be achieved through pointing the test at another folder on disk containing the version against which it should be compared?
Yes I was thinking about the latter approach. We git clone to another folder (say /tmp/xyz
), do checkout the commit there, build a release, start it, and run some queries.
All the same, it would be useful to have the ability to only run the test against the current version as well. If you are comparing against some baseline numbers, then these are likely to change while you make tweaks to some experimental code. Running repeated tests on the base line version would therefore be a waste?
When I talked about comparing, I mostly meant comparing the state of the HEAD to the current release. The goal is to have some more reliable idea about our performance trends, so that when we release the next version, we can report if there are radical improvements (or degradation).
Memory readings could be collected from inside BEAM? That would then discount the memory used by other system processes, which in fact seems desirable.
I concur :-)
Also I am not quite clear on where you would want to collect the memory stats for historical analysis? You mean we check in past runtimes to the repository so we have historical comparison data to check against? Or that we store times in some external system for fine grained historic graphs? (Both?)
I didn't think much about such operational details :-) I'd suggest having something very lightweight in the beginning, say a folder with files (one per each measurement). And maybe a script which produces a graph (or graphs) from these files, and maybe mails it to everyone, say once a week. Later on, if we're happy with this data, we can reach for something more mature to hold our time series.
I didn't think much about such operational details :-) I'd suggest having something very lightweight in the beginning, say a folder with files (one per each measurement). And maybe a script which produces a graph (or graphs) from these files, and maybe mails it to everyone, say once a week. Later on, if we're happy with this data, we can reach for something more mature to hold our time series.
Ok, the crucial bit here being that you want this to be automatically run, say as part of the nightly build?
So there are two separate uses here then:
acatlas1
has been set aside for these performance tests.
I think I'm not going to be doing this for the near future, as I'm going to be working on anonymization.
I think I'm not going to be doing this for the near future, as I'm going to be working on anonymization.
What do you mean by that? We have a month now of tuning and testing and doing exactly that kind of work!?
Right! But anyway - I'm doing bugs now. I'll reassign to myself if and when I start working on this. Shouldn't this be in M4 if you think it's worth doing now?
Shouldn't this be in M4 if you think it's worth doing now?
I guess it could be in the sense that it has priority. However it's not something I see as needing to be completed for M4 to be considered complete. That's the justification I am giving for not adding it.
Bugs definitively take priority.
And I suppose this task can be done in a piecemeal fashion over time.
@sebastian @cristianberneanu @obrok
Inspired by #1488, I'm starting a discussion on how to assess performance trends in our system. Ideally, it should be simple to compare query times and memory usage for various types of queries and data sources between the current
HEAD
and some reference point (e.g. previous release).The current
make perftest
is a good starting point, but I think we need to extend it to get more detailed and reliable numbers. Here are some ideas based on a cursory scan of the perf test code:Preferably, we'd should be able to run something like:
./compare_perf release_170200
. The script would compare performance of the current local branch to the desired branch (or commit). The output would be performance numbers (times and memory usage, before/after and relative difference) for each data source (e.g. MongoDB, PostgreSQL, emulated queries) and each feature (simple aggregates, splitters, joins, subqueries). We might also output the numbers for each feature per data source (e.g. joins in a mongodb data source).Obviously, it might happen that some queries are not testable with the previous version (for example, if the test query uses some newly supported feature), but this can be easily handled by the test script, which can output something like
N/A
orerror
for such cases.Feel free to add your thoughts or other ideas.