galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.4k stars 1k forks source link

Actionable items for the UserBased ObjectStore PR #8790

Open VJalili opened 5 years ago

VJalili commented 5 years ago

Following the discussion on the PR, IMHO a list of major actionable items to be implemented on the User-Based ObjectStore (UBOS) PR (https://github.com/galaxyproject/galaxy/pull/4840) before we can merge it are as it follows:

Regarding performance:

We can try to avoid any extra calls to methods that may introduce performance penalty (as listed above). However, an ultimate solution would be to have (a) method/tools for benchmarking (e.g., within the context of load/stress testing), (b) gold standard for each of the metric to compare with, and (c) a range of acceptable performance degradation.

To best of my knowledge, unfortunately we do not have such a formalized load/stress testing platform. Therefore, our performance evaluations are subjective to some extent, though while all merits discussion, IMHO, should not be on the critical path.

Can calls to credential refresh pushed to lower levels?

I did not find a way of doing this with a reasonable amount of changes. That is because refreshing credentials requires properties that are available at app and trans level (e.g., PSA needs Strategy, or need to call this method; hence, those calls are implemented at the deepest level where trans or app are available.

Regarding abstractions:

There might be different (and better) ways of incorporating user info into ObjectStore and Dataset. We warmly welcome all the PRs to https://github.com/galaxyproject/galaxy/pull/4840 in order to improve abstractions. The rational behind implementing it as is now, is to minimize impact on Dataset and ObjectStore since introducing an abstraction over either of these may have significant implications. Therefore, we agreed to pursue an incremental improvements approach, where we incorporate user info into Dataset and ObjectStore with minimal changes to these models, and incrementally improve them.

Would implementing the listed actionable items help merging the PR in an "incremental improvements" fashion?

@jmchilton @jgoecks @natefoo @dannon @nsoranzo @erasche @bgruening (and everyone else :) ) please feel free to chime in.

dannon commented 5 years ago

Thanks for opening the followup issue to move this along. I'll think about this more, but just to respond to the following while it's on my mind:

To best of my knowledge, unfortunately we do not have such a formalized load/stress testing platform. Therefore, our performance evaluations are subjective to some extent, though while all merits discussion, IMHO, should not be on the critical path.

I just really don't think it flies to say "Well, X is a concern in regards to this new feature, but since we don't have tests for that yet, we'll give it a pass". Dataset resolution/wrangling is a pretty critical bit of Galaxy and any significant impact in performance here could have wide reaching effects. It'd be nice to know that things aren't going to be impacted, and including these benchmarks as a part of this effort would be ideal.

VJalili commented 5 years ago

Indeed, performance concerns are merit, so actually the first two changes I suggest are regarding performance concerns. However, the point of the text you quoted is that our performance assessments are subjective.

dannon commented 5 years ago

I got that. My point was that it'd be great to see objective benchmarks and tests along with this.

On Thu, Oct 10, 2019, 10:07 PM Vahid notifications@github.com wrote:

Actually the first two changes I suggest are regarding performance concerns. The point of your quoted text is that our performance assessments are subjective.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/galaxyproject/galaxy/issues/8790?email_source=notifications&email_token=AABF6BRNFGR5V6HGKPUELXLQN7NVHA5CNFSM4I7TPHNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA6QBOY#issuecomment-540868795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABF6BUFOGWISO4DQBKCK3DQN7NVHANCNFSM4I7TPHNA .

VJalili commented 5 years ago

Let me see if I can elaborate.

We all agree performance is everyone's concern. However, do we know for certain if changeX is introducing a performance penalty? if so, how much? what percentage of performance degradation changeX has introduced? is it 80%? is it 0.01%? what is the acceptable trade-off? are we good with 10% degradation?

Generally there exist well-established methods for performance evaluation:

  1. Study the asymptotic behavior of an algorithm, which can be anywhere between constant (O(1)) and linear w.r.t input size (e.g., O(1)) to polynomial, factorial, and exponential (e.g., O(n^2) or O(n!)). Analyzing the asymptotic behavior of a Galaxy operation using such algorithm analysis methods requires considering all the operations from the moment user/system hits a request until it's result is calculated. However, given Galaxy's operation's complexity (e.g., cyclomatic complexity), such computation could be quite challenging. If we had this option, we could have said the complexity of OperationX is O(log n) and with the introduced change, the complexity has become, e.g., O((log n) + 2), which asymptotically is O(log n); in other words the performance degradation is neglect-able.
  2. Run a performance benchmark (e.g., load/stress test, or read this Wikipedia page). Generally, for such tests you define an operation under test, a data collection for benchmark, number of retries (hence you can average results over retries), and target vs. source branch whose performance degradation you want to compute. This is the case I think most fits Galaxy, which unfortunately we have not set it up yet.
  3. Modern debuging and IDEes offer various profiling tools, among them are CPU and memory (including heap) profiling. These can help identify hotspots in the code w.r.t. CPU or memory usage. It can also help identity memory leaks, Garbage Collector (GC) overloads, and etc. I have used this option quite extensively with Visual Studio on .NET codes, but I have not managed to set it up for Galaxy.

So, IMHO, given that neither of these methods are feasible for us, at least in short term, I am assuming our performance assessments are subjective. We'll do our best to minimize possibility of introducing significant performance penalty (again, subjectively), but we can have concrete evaluations if we can follow either of the aforementioned methods.

dannon commented 5 years ago

@VJalili There was no need at all to elaborate, I don't know how we're crossing wires here. My request, again, was:

This is the case I think most fits Galaxy, which unfortunately we have not set it up yet.

Let's set up a few simple tests as a part of this.

VJalili commented 5 years ago

If you mean we implement load/stress tests, then I guess basis of those would better fit as a separate PR, and once ready this PR can leverage and implement some specific ones if necessary.

@dannon @jmchilton do we want to implement load/stress tests for Galaxy to evaluate possible performance penalties of ubos PR, or we agree first two points on the aforementioned list would suffice?

VJalili commented 5 years ago

ping @jmchilton

jmchilton commented 5 years ago

We 100% need performance tests for Galaxy. I can't/won't do it all - but I will definitely help us get there. Last weekend I generalized a lot of the tracking I did manually for a series of many performance related PRs in the past so we can publish it to statsd for more formal tracking and PR'd it as https://github.com/galaxyproject/galaxy/pull/8800. I did that work in part in response to this discussion.

That said, we routinely make performance assessments based on domain experience (see for instance https://github.com/galaxyproject/galaxy/pull/7791) and this will continue.

VJalili commented 5 years ago

I think some folks from @bgruening 's team have developed some experimental benchmarking setup.

@jmchilton I can help extend your work to cover user-based objectstore related concerns. I would need your help to understand how it works so that I can extend it.

VJalili commented 4 years ago

As suggested in the list of actionable items, I implemented the logic of enabling/disabling the feature introduced with the user-based object store PR; see this commit. When enabled, Galaxy persists user data in their plugged media, and inserts an entry to dataset-media association table.

My point was that it'd be great to see objective benchmarks and tests along with this.

As suggested by @dannon I setup a benchmark comparing the performance of file upload and tool execution between three branches: current latest dev (with no changes), user-based objectstore branch with the introduced feature enabled and disabled. The input are 64 fastqsanger files (these files copied multiple times) of size 1.43 GB, which are first uploaded to Galaxy, and are then quality-controlled (see the first part of this tutorial), producing 192 datasets in total. Performance is captured via StatsD and Graphite running on docker. When user-based object store is enabled, Galaxy persists user data in a different local path.

local_job_runner

Local job runner The three branches are shown in blue (left-most), red (middle), and green (right-most) for respectively user-based object store disabled, enabled, and current latest dev. The runtime of upload and tool execution are marked by letters A and B respectively. The tasks are executed twice on each branch.

As shown in the above figure the job execution performance is at the same scale for the three branches for both uploading data and tool execution.

job_wrapper

Job Wrapping This plot compares the median of job wrapping run time between the three branches. As shown here, the median performance of current latest dev and the user-based object store with the introduced feature disabled are relatively equal.

When the user-based object store is enabled, an association entry is added while wrapping a job that associates a Dataset with a user's Media. Accordingly, the time required to insert an entry to database is added to total job wrapping time, making the job wrapping performance when user-based object store is enabled slightly slower than current latest dev.

@dannon @jmchilton:

VJalili commented 4 years ago

@dannon do you think this benchmark covers the areas you had in mind?

I am also planning on setting up a benchmark using https://github.com/usegalaxy-eu/GalaxyBenchmarker

dannon commented 4 years ago

@VJalili I guess it's on the right track but the unit being evaluated here is likely too long to see anything relevant when trying to consider the impact of this across thousands of points of use in the application -- that is, the actual code path we're concerned with understanding the performance change to is a very, very small portion of the total execution time seen here. Can you find a more minimal example (maybe just resolving the dataset location?) and then do that in a test loop?

jmchilton commented 4 years ago

At first glance - it would seem ca7e71609b5b08b7ee870e0452dd663b4e399372 gets the extra joins out of the critical path for executing many jobs simultaneously - this is what I was asking for. As long as it also isn't called unconditionally from tool/actions code. It should probably be done without the extra config option (doesn't the object store know it it is enabled and a user based object store - can you ask it instead?).