Performance decline of the REST API for creating and updating work items

pmacik commented 7 years ago

Expected behavior

The throughput and response time not to decline over time.

Actual behavior

Performance tests show a rapid decline in response time over time.

Steps to reproduce the problem

Clone https://github.com/pmacik/ldimaggi-perfcake
Configure test parameters in ./_setenv.sh.
Execute ./devtools-performance-core-crud-soak.sh. !!! DOCKER ALERT: THE SCRIPT REMOVES ALL DOCKER CONTAINERS AND VOLUMES FOR A CLEAN ENVIRONMENT !!!
When test is done, open ./devtools-performance-results/0/ directory to see the test results. Open ./devtools-performance-results/0/perfcake-chart/index.html to see the result charts.

Specifications

Which version of almighty-core are you running?

$ curl -X GET -H "Accept: application/json" http://localhost/api/status
{"buildTime":"2017-04-07T13:29:28Z","commit":"49e3018e295131c8f56fd2edab5543503d1427ac","startTime":"2017-04-07T13:30:41Z"}

Which operating system do you use? Fedora 25 (4.9.14-200.fc25.x86_64)

pmacik commented 7 years ago

I'm going to soon attach the reports that I get to show the performance degradation.

hectorj2f commented 7 years ago

@pmacik Could you also detail a bit more the scenario ? I mean:

How many core containers are running?
What is the time in between perf tests ?
How many Work items were you using ?
Could you also get some information about the resource usage while doing the test ? e.g. dstats ...

Is it a linear increment ? Did you have any other programs running while doing the test :) ?

pmacik commented 7 years ago

@hectorj2f The scenario is described here.

The test uses 100,000 ITERATIONS for each phase, it has 30 THREADS and 300 USERS.

I run only a single core container, no other container is running (other than core and db), no other application is running on the machine itself (it is a dedicated physical performance machine in our lab).

All the traffic from clients to the REST API goes through localhost

pmacik commented 7 years ago

report.zip

aslakknutsen commented 7 years ago

Most likely related to https://github.com/almighty/almighty-core/commit/6b1137315f75c2184fcea79ebed6a8011ff896ed

Updated: Unlikely to be related to that.

I can reach about 1500 r/s locally against a single WI based on code from the 6, but only ~1300 based on master. Not too bad considering the additional sync db connection that is happening in the call.

But one I noticed was that Core fail relativly bad atm. When it gets enough connections it seems it might get stuck in the DB pool and then starts complaining about too many open files and doesn't seem to really recover very well. Unsure tho if that is related to changes in core or the local upgrade to Fedora 25 since last I ran perf tests. This is running 1500 + r/s with 90 connections in the pool. We should look at as a general thing to attempt to time out certain operations before we die of natural causes.

aslakknutsen commented 7 years ago

One thing is sure tho, debug mode is a lot less performant than not debug mode. Could only get about 600 r/s in debug mode.

ldimaggi commented 7 years ago

Hi Aslak,

Thanks for raising this issue - I had asked about this weeks ago - I thought that there would be no significant performance issues running the server in debug mode. Is this not the case?

Thanks!, Len

On Mon, Apr 10, 2017 at 5:54 PM, Aslak Knutsen notifications@github.com wrote:

One thing is sure tho, debug mode is a lot less performant than not debug mode. Could only get about 600 r/s in debug mode.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/almighty/almighty-core/issues/1149#issuecomment-293090467, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnOPSCyGeJxyGQjZMATdHFkmZHB8Nj8ks5ruqUigaJpZM4M268j .

-- Len DiMaggio (ldimaggi@redhat.com) JBoss by Red Hat 314 Littleton Road Westford, MA 01886 USA tel: 978.392.3179 cell: 781.472.9912 http://www.redhat.com http://community.jboss.org/people/ldimaggio

hectorj2f commented 7 years ago

@aslakknutsen I like that We should look at as a general thing to attempt to time out certain operations before we die of natural causes.

IMHO It is a bit scary that the CREATION operation shows a linear performance. It might be interesting to see how (in time, resource usage --> graphically) it recovers once the test is done.

@pmacik DELETE: any reason why do we see a spike at the beginning of the DELETE operation ? Did the tests run sequentially without cooling down time ? Just to know.

pmacik commented 7 years ago

@aslakknutsen Yes, debug or logging in general is always a performance killer. How can we disable the debug mode, then? I hope it is not the same as the dev mode?

pmacik commented 7 years ago

@hectorj2f The spike at the beginning is the very first request in the respective cycle of the test so it is a bit higher than the rest. The test runs in cycles and in each cycle it has the 4 CRUD phases and in each phase it does exactly 100k of the respective operations. And yes, it does it sequentially so that we can observe how each operation affects the system.

hectorj2f commented 7 years ago

@pmacik the debug mode can be enabled/disabled by setting this ENV varALMIGHTY_DEVELOPER_MODE_ENABLED=1. The logging is affected by this flag, otherwise the default is INFO level for developer_mode=0.

pmacik commented 7 years ago

@hectorj2f OK, that's what I was afraid of, the DEV mode and debug are the same. Unfortunatelly, I need the dev mode enabled for the /api/login/generate to generate auth tokens for the test clients.

hectorj2f commented 7 years ago

aslakknutsen commented 7 years ago

Thanks for raising this issue - I had asked about this weeks ago - I thought that there would be no significant performance issues running the server in debug mode. Is this not the case?

And I thought you had stopped using Debug mode and were using a token from KC directly. @alexeykazakov ?

aslakknutsen commented 7 years ago

But the complete drop in requests can't be to debug mode if you've been running that all along as I don't think there's been any drastic changes related to that between those two dates.

alexeykazakov commented 7 years ago

And I thought you had stopped using Debug mode and were using a token from KC directly.

Yes, it's possible to get test user tokens directly from our KC (test realm) even if the core is not running in the debug mode.

@ldimaggi ping me and I will explain how to do it.

fabric8-services / fabric8-wit