fecgov / fecfile-web-api

Back-end API for FECfile application
Other
8 stars 2 forks source link

Spike: Research/document application architecture performance issues for FECFile API #755

Closed lbeaufort closed 6 months ago

lbeaufort commented 7 months ago

Take baseline locust tests

Consider different committee acccounts/breadth of test data

Research/document

Dev notes

    1. Tests
    1. Summary

       By increasing database instance size and application memory, we were able to get responses 2-5x faster and get all locust test queries under 1 second. Additional query tuning is needed because the database and application are still vastly over-provisioned for the volume of data and requests.

Results are saved here: https://drive.google.com/drive/folders/1ivSksq3hRjYtnkLOlvvtKf7z7h8si2Tq?ths=true because they are HTML, it's easiest to save them to your local to view. Locust can also save CSV's.

See https://github.com/fecgov/fecfile-web-api/issues/783 and https://github.com/fecgov/fecfile-web-api/issues/771

QA Notes

null

DEV Notes

null

Design

null

FECFILE-217

lbeaufort commented 7 months ago

Old research ticket https://github.com/fecgov/openFEC/issues/4382

lbeaufort commented 7 months ago

Answers from cloud.gov:

Can you share more information about how app memory allocation impacts the virtual machine hardware? Currently cloud.gov is currently utilizing R6i VM instances (r6i.2xlarge) for all of our VMs which are powered by 3rd Generation Intel Xeon Scalable processors (Ice Lake) with an all-core turbo frequency of 3.5 GHz. Regarding how application memory allocation impacts virtual machine hardware, CPU entitlement is calculated in Cloud Foundry based on the amount of memory you specify per instance. So to allot more CPU for your application instances, you can increase the memory per instance. Or you can increase the number of application instances so that there is less load on each instance. So the CPU figure reported by cf app appname is the consumption of system CPU on the VM, while CPU entitlement is the percentage you're using of your entitlement. If your CPU entitlement is greater than 100%, then your app is using spare CPU resources from the host, but there is no guarantee of spare CPU resources in the future. This article does a good job explaining the differences between the two figures: https://www.cloudfoundry.org/blog/better-way-split-cake-cpu-entitlements/.

I’m under the working assumption these are EC2 instances – can you point me to any Github lookups on what instance type you’re running at different levels

Sure thing, linked here is the code that addresses your question regarding the EC2 instance types we use on the platform for customer application deployments.

Would you expect at 1GB/2GB/3GB/4GB/5GB the machine size would be the same? All customer applications run on the same type and size of VM, r6i.2xlarge regardless of the amount of memory allocated to the application/application instances. However, as mentioned above, the more memory allocated to the application/application instances, the more CPU resources on the VM are reserved for that application/those application instances.

Any other suggestions come to mind, if our apps are running slowly? Particularly for your prod and stage spaces, their average CPU usage is well above 100% as noted in the screenshots attached to this response. Based on what statistics I am seeing with regards to average CPU usage with regards to CPU entitlement I believe that the performance for the fecfile-web-api application in the fecfile-fecfileonline-prototyping org (for the prod, stage and dev spaces) would greatly benefit from increasing the application memory quota so that there is more CPU allocated to your application. As noted above, if your CPU entitlement is greater than 100%, then your app is using spare CPU resources from the host, but there is no guarantee of spare CPU resources in the future and as such may account for why your applications are experiencing poor performance.

MitchellTCG commented 7 months ago

@lbeaufort Since all the checks boxes are checked, can we move this ticket to QA Review? From there Shelly will move it to Stage Ready.

mjtravers commented 7 months ago

Moving to QA. This was a spike ticket, no changes to app code to review. Comments and summary above a results of spike.

WiseQA commented 7 months ago

Per DEV no QA needed.

Moved to Stage Ready.