Open VJalili opened 5 years ago
Thanks for opening the followup issue to move this along. I'll think about this more, but just to respond to the following while it's on my mind:
To best of my knowledge, unfortunately we do not have such a formalized load/stress testing platform. Therefore, our performance evaluations are subjective to some extent, though while all merits discussion, IMHO, should not be on the critical path.
I just really don't think it flies to say "Well, X is a concern in regards to this new feature, but since we don't have tests for that yet, we'll give it a pass". Dataset resolution/wrangling is a pretty critical bit of Galaxy and any significant impact in performance here could have wide reaching effects. It'd be nice to know that things aren't going to be impacted, and including these benchmarks as a part of this effort would be ideal.
Indeed, performance concerns are merit, so actually the first two changes I suggest are regarding performance concerns. However, the point of the text you quoted is that our performance assessments are subjective.
I got that. My point was that it'd be great to see objective benchmarks and tests along with this.
On Thu, Oct 10, 2019, 10:07 PM Vahid notifications@github.com wrote:
Actually the first two changes I suggest are regarding performance concerns. The point of your quoted text is that our performance assessments are subjective.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/galaxyproject/galaxy/issues/8790?email_source=notifications&email_token=AABF6BRNFGR5V6HGKPUELXLQN7NVHA5CNFSM4I7TPHNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA6QBOY#issuecomment-540868795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABF6BUFOGWISO4DQBKCK3DQN7NVHANCNFSM4I7TPHNA .
Let me see if I can elaborate.
We all agree performance is everyone's concern. However, do we know for certain if changeX is introducing a performance penalty? if so, how much? what percentage of performance degradation changeX has introduced? is it 80%? is it 0.01%? what is the acceptable trade-off? are we good with 10% degradation?
Generally there exist well-established methods for performance evaluation:
So, IMHO, given that neither of these methods are feasible for us, at least in short term, I am assuming our performance assessments are subjective. We'll do our best to minimize possibility of introducing significant performance penalty (again, subjectively), but we can have concrete evaluations if we can follow either of the aforementioned methods.
@VJalili There was no need at all to elaborate, I don't know how we're crossing wires here. My request, again, was:
This is the case I think most fits Galaxy, which unfortunately we have not set it up yet.
Let's set up a few simple tests as a part of this.
If you mean we implement load/stress tests, then I guess basis of those would better fit as a separate PR, and once ready this PR can leverage and implement some specific ones if necessary.
@dannon @jmchilton do we want to implement load/stress tests for Galaxy to evaluate possible performance penalties of ubos PR, or we agree first two points on the aforementioned list would suffice?
ping @jmchilton
We 100% need performance tests for Galaxy. I can't/won't do it all - but I will definitely help us get there. Last weekend I generalized a lot of the tracking I did manually for a series of many performance related PRs in the past so we can publish it to statsd for more formal tracking and PR'd it as https://github.com/galaxyproject/galaxy/pull/8800. I did that work in part in response to this discussion.
That said, we routinely make performance assessments based on domain experience (see for instance https://github.com/galaxyproject/galaxy/pull/7791) and this will continue.
I think some folks from @bgruening 's team have developed some experimental benchmarking setup.
@jmchilton I can help extend your work to cover user-based objectstore related concerns. I would need your help to understand how it works so that I can extend it.
As suggested in the list of actionable items, I implemented the logic of enabling/disabling the feature introduced with the user-based object store PR; see this commit. When enabled, Galaxy persists user data in their plugged media, and inserts an entry to dataset-media association table.
My point was that it'd be great to see objective benchmarks and tests along with this.
As suggested by @dannon I setup a benchmark comparing the performance of file upload and tool execution between three branches: current latest dev (with no changes), user-based objectstore branch with the introduced feature enabled and disabled. The input are 64 fastqsanger files (these files copied multiple times) of size 1.43 GB, which are first uploaded to Galaxy, and are then quality-controlled (see the first part of this tutorial), producing 192 datasets in total. Performance is captured via StatsD and Graphite running on docker. When user-based object store is enabled, Galaxy persists user data in a different local path.
Local job runner The three branches are shown in blue (left-most), red (middle), and green (right-most) for respectively user-based object store disabled, enabled, and current latest dev. The runtime of upload and tool execution are marked by letters A and B respectively. The tasks are executed twice on each branch.
As shown in the above figure the job execution performance is at the same scale for the three branches for both uploading data and tool execution.
Job Wrapping This plot compares the median of job wrapping run time between the three branches. As shown here, the median performance of current latest dev and the user-based object store with the introduced feature disabled are relatively equal.
When the user-based object store is enabled, an association entry is added while wrapping a job that associates a Dataset
with a user's Media
. Accordingly, the time required to insert an entry to database is added to total job wrapping time, making the job wrapping performance when user-based object store is enabled slightly slower than current latest dev.
@dannon @jmchilton:
@dannon do you think this benchmark covers the areas you had in mind?
I am also planning on setting up a benchmark using https://github.com/usegalaxy-eu/GalaxyBenchmarker
@VJalili I guess it's on the right track but the unit being evaluated here is likely too long to see anything relevant when trying to consider the impact of this across thousands of points of use in the application -- that is, the actual code path we're concerned with understanding the performance change to is a very, very small portion of the total execution time seen here. Can you find a more minimal example (maybe just resolving the dataset location?) and then do that in a test loop?
At first glance - it would seem ca7e71609b5b08b7ee870e0452dd663b4e399372 gets the extra joins out of the critical path for executing many jobs simultaneously - this is what I was asking for. As long as it also isn't called unconditionally from tool/actions code. It should probably be done without the extra config option (doesn't the object store know it it is enabled and a user based object store - can you ask it instead?).
Following the discussion on the PR, IMHO a list of major actionable items to be implemented on the User-Based ObjectStore (UBOS) PR (https://github.com/galaxyproject/galaxy/pull/4840) before we can merge it are as it follows:
[ ] refreshing credentials should happen for only backends that require credentials, and only when credentials are expired or are about to expire within a configurable window (reg. possible performance penalty);len(hda.dataset.active_storage_media_associations) == 0
with a property;Regarding performance:
We can try to avoid any extra calls to methods that may introduce performance penalty (as listed above). However, an ultimate solution would be to have (a) method/tools for benchmarking (e.g., within the context of load/stress testing), (b) gold standard for each of the metric to compare with, and (c) a range of acceptable performance degradation.
To best of my knowledge, unfortunately we do not have such a formalized load/stress testing platform. Therefore, our performance evaluations are subjective to some extent, though while all merits discussion, IMHO, should not be on the critical path.
Can calls to credential refresh pushed to lower levels?
I did not find a way of doing this with a reasonable amount of changes. That is because refreshing credentials requires properties that are available at
app
andtrans
level (e.g., PSA needs Strategy, or need to call this method; hence, those calls are implemented at the deepest level wheretrans
orapp
are available.Regarding abstractions:
There might be different (and better) ways of incorporating user info into ObjectStore and Dataset. We warmly welcome all the PRs to https://github.com/galaxyproject/galaxy/pull/4840 in order to improve abstractions. The rational behind implementing it as is now, is to minimize impact on
Dataset
andObjectStore
since introducing an abstraction over either of these may have significant implications. Therefore, we agreed to pursue an incremental improvements approach, where we incorporate user info intoDataset
andObjectStore
with minimal changes to these models, and incrementally improve them.Would implementing the listed actionable items help merging the PR in an "incremental improvements" fashion?
@jmchilton @jgoecks @natefoo @dannon @nsoranzo @erasche @bgruening (and everyone else :) ) please feel free to chime in.