Open momcilo78 opened 4 years ago
I ran the following test locally (docker install method)
test_api_launch_job.sh
for i in {1..100}; do
curl -k -H "Content-Type: application/json" -H "Authorization: Bearer .." -H "Content-Type: application/json" -X POST -d '{"extra_vars": { "username": "sbf" }}' https://localhost:8043/api/v2/job_templates/7/launch/;
done
curl -s -k -H "Authorization: Bearer .." -X GET https://localhost:8043/api/v2/job_templates/7/jobs/?status=pending | head -c 14
note: the second line will make a final query to get a count of jobs in pending
state, which should be near but less than 100
time ./test_api_launch_job.sh
{"count":92,
real 0m33.538s
user 0m1.538s
sys 0m0.396s
So this queues up 100 jobs in 33 seconds for me.
@fosterseth I will need some time to verify with the setup you have.
Can you share the details of your setup? (client/server side is on the same host, what sort of machine is it?)
My tests were done in following environments:
Closed by mistake, I will follow up...
I have just prepared a new test that loops over 100 tickets launches the workflow in the same way the corresponding ansible module would do it:
For each operation, it outputs minimal, maximal, average, median and sum time. The client did run locally on the awx machine, so network latency should not be an issue.
As seen below 100 workflow launches take 91 seconds, which means we are nowhere near your performance. The closest I could get is 381ms for a launch. As for the max time, I think this occurs once the significant number of jobs loads the machine.
What I typically see before the system gets loaded, is response times that are 600ms - 900ms. Once it gets loaded it goes up to 13,93 seconds to launch.
I am still not sure if this is awx problem or perhaps performance problem of our hypervisor or the configuration issue (basically almost no customization compared to installation instructions).
Also note that maximal times for search operations are considerably lower than workflow launch jobs.
search organization: min=0.18305502505972981, max=0.5401128169614822, avg=0.3092910557612777, median=0.3090830044820905, sum=30.929105576127768
search inventory min=0.14787942706607282, max=1.331877565011382, avg=0.284737908183597, median=0.27983733091969043, sum=28.473790818359703
search workflow: min=0.1962727578356862, max=0.5150523320771754, avg=0.3245356236794032, median=0.32885867008008063, sum=32.45356236794032
workflow launch job: min=0.3868273259140551, max=13.936025754082948, avg=0.9151694104983471, median=0.6064609769964591, sum=91.51694104983471
ISSUE TYPE
SUMMARY
Job launching through job_template API is slow. Initially I have observed this when using awx.awx.tower_job_launch. It appears that the job launching rate is ~ 1 job/second. Initially I attributed this to the initial slowness of ansible when it comes to execution of single tasks due to the overhead.
However, to confirm this I wrote a small python program that connects submits the POST requests to /api/v2/job_templates/{id}/launch/ endpoint. To be absolutely clear I do not expect the jobs to be executed immediately but I was expecting that launching a job should be in millisecond range, enabling queuing tens or even hundreds of jobs per second. Please note that this is observed on an empty non-clustered environment.
ENVIRONMENT
STEPS TO REPRODUCE
The issue arises on 2 tested environments:
EXPECTED RESULTS
100 jobs get launched (not executed) within up to 10 seconds. Please comment on what is the expected throughput based on your own experience, since we also suspect we are missing something in our setup.
ACTUAL RESULTS
It takes ~ 100 seconds, due to the 1 job/second rate.
ADDITIONAL INFORMATION