cyberark / conjur

CyberArk Conjur automatically secures secrets used by privileged users and machine identities
https://conjur.org
Other
756 stars 121 forks source link

Test for Rails upgrade performance regressions #1419

Closed alexkalish closed 4 years ago

alexkalish commented 4 years ago

There is some concern that upgrading Rails and disabling MD5 could have an adverse affect on Conjur performance. While I'm not terribly worried, verification is absolutely warranted. The IL team has created a set of performance tests that we should be able to easily leverage. Details to come soon.

The load test should execute the following scenario:

Setup

  1. Authenticate as the admin user
  2. Load the following policies (from conjurdemos/dap-intro):
    • policy/users.yml (into the root namespace, using replace)
    • policy/policy.yml (into the root namespace using append)
    • policy/apps/myapp.yml (into the staging namespace using append)
    • policy/apps/myapp.yml (into the production namespace using append)
    • policy/application_grants.yml (into the root namespace using append)
    • policy/hosts.yml (into the root namespace using append)
  3. Record the test-host-1 API key for future use
  4. Set the following variables with random strings:
    • production/myapp/database/username
    • production/myapp/database/password
    • production/myapp/database/url
    • production/myapp/database/port

Load Test Perform the following actions:

  1. Authenticate using the test-host-1 API key and retrieve a token
  2. Retrieve the production/myapp/database credentials using batch retrieval

Please run the test against the 11.4 DAP appliance, then again on the latest DAP build, posting results back to this issue.

We'll run this test in the future, so please make sure the JMeter script is checked into a repository.

jvanderhoof commented 4 years ago

An overview of the is located in Confluence

The original test was conducted using JMeter. The pre-FIPS JMeter test is here: JMeter Script

rrefael commented 4 years ago

When we discussed on the relevant performance testing needed for the OpenSSL change, there were two main things we were requested to focus on:

This is indeed translated into the tests specified inside DAP performance confluence.

In addition:

alexkalish commented 4 years ago

@rrefael: To be clear, we will only be testing Conjur OSS, so followers, UI and Synchronizer are all out of our current scope.

  1. Regarding the possible impacts on the UI, I would be surprised if there were any, but I would defer that question to @jvanderhoof.
  2. The description above doesn't actually mention the time dimension of the tests. @jvanderhoof: What did you have in mind?
h-artzi commented 4 years ago

Note: These tests were run locally, it is possible other applications running in the background interfered with the stats

dap-intro tagged with 11.4 11.4.0.png

dap-intro tagged with 5.0-stable 5.0-stable.png

micahlee commented 4 years ago

@h-artzi What are Set BaseDate and Set DateNow?

h-artzi commented 4 years ago

@micahlee, they are both timestamps and when the time difference between the two is too large then the hosts reauthenticate with the DAP instance. I decided to keep this feature from the jmeter script attached to this ticket, however, it is most likely overkill for the current test.

hilagross commented 4 years ago

Hi @alexkalish , As @rrefael was saying, the requirement are specified in the confluence page and should be follow as we discuss on them a while ago, it doesn't matter if we are testing conjur or DAP, we should understand that the performance is the same and we the load can be done. Running the same tests will help us with OpenSSL, if will find a gap in OpenSSL performance we will be easily able to understand if the degradation was part of Rails upgrade or something to do with OpenSSL changes.

In general, we should always aspire to run performance tests before release, especially after a big change like Rails.

alexkalish commented 4 years ago

@hilagross: Agreed that performance testing is needed! What I'm hearing is that you do not have a strong preference for testing OSS vs DAP, as long as we confirm no performance regressions. Additionally, I'm assuming that you have no objections/concerns with test details in the description. Are those statements correct? Thanks.

rrefael commented 4 years ago

@alexkalish The rails upgrade is a feature that may hold performance degradations, which may occur in OSS and DAP. Since DAP enables more features and use cases than OSS, I think that the verifications in DAP should at least include verifying this delta. Not verifying DAP at all seems to me like a risk.

jvanderhoof commented 4 years ago

@rrefael, I understand your concern for introducing performance issues to DAP with the FIPS compliance work. When I took a close look at the Daily average build time for cyberark/conjur, I noticed a slowdown when the Rails 5 code was merged. Below is the stage level view:

Conjur Pipeline Stages

I'm going to open an issue to address the slowdown (which appears to fairly universally impact all the tests). We saw a very small slowdown in early runs of the jMeter load test (workflow captured above), but nothing like the slowdown we're seeing above.

As a note, we used the DAP appliance (11.4 vs latest stable). We also run a multi-day load test as part of the release process to find issues that result from long tasks. We'll

We're planning to continue expanding our load testing scope, but have very limited capacity (just a single engineer focused on finishing the Rails 5 work).

As a short term plan, let's use the data from Jenkins, in addition to the various load tests, to see if we're introducing any major performance issues with the FIPS work.

jvanderhoof commented 4 years ago

@h-artzi, to wrap this up, can you please create a Sharepoint spreadsheet with the Rails 4 and Rails 5 test results? Please calculate the differences between the metrics generated in each run. We want to understand percentage change.

alexkalish commented 4 years ago

@jvanderhoof @h-artzi: Was this just on a Hadar's laptop? Do we think that is a controlled enough environment? Also, our customers will be running on Linux. Could that OS difference have any impact on the results?

h-artzi commented 4 years ago

@alexkalish it is currently being run on my laptop and it is possible it is leading to some error. For example, there is a significant jump while loading one of the policies.

jvanderhoof commented 4 years ago

An overview of the our load test comparison:

Test case: Batch retrieval of four variables, executed 9200 times via JMeter. This was run on a developer laptop.

Results (comparison between pre-upgrade and post upgrade DAP master):

Average Min Max 90th pct 96 pct 99 pct
Batch Retrieve Secrets 1.03% 0.00% -19.15% 0.00% -1.33% 0.00%

Conclusion: No noticeable performance change between Rails 4 and Rails 5.