Open dheitzer opened 2 months ago
Sasha Dresden commented: After a lot of digging around and research I have found the issue lies with the create_committee_views.py script which gets run every time the API is started/restarted.
Specifically, attempting to connect and manipulate the database during the spin up process causes a cpu spike. I attempted a few different tests on my local docker setup, where it would rebuild the committee view when there was a single committee with either 0 transactions or 10,000 transactions, or just skip it if it found the committee view already existing. All of them led to a spike in CPU usage. The only thing that worked was to not initiate the connection, either by removing the script from the start up process or adding a flag to tell it not to run it.
When the CPU spike happened, CPU utilization was 100% before settling back down to sub-10%. When the script was skipped, the CPU utilization never went over 10%.
Looking into the script [itself|https://github.com/fecgov/fecfile-web-api/blob/develop/django-backend/fecfiler/committee_accounts/management/commands/create_committee_views.py], it is recreating all of the committee views on startup. This seems unnecessary. It makes sense for the local setup because we are creating a new committee as part of spinning up our local docker, but with dev, stage, or production, they would all already have all of their committee views created and up to date. A future ticket could resolve whether this script could simply be removed for non-local environments. Or, if it is deemed necessary, find a way to run it only conditionally. Perhaps just when the API is started, but not restarted, which could be accomplished via setting a flag.
Matt Travers commented: No code to review. Sending directly to QA.
Follow on ticket created from findings of this ticket: [https://fecgov.atlassian.net/browse/FECFILE-1462|https://fecgov.atlassian.net/browse/FECFILE-1462|smart-link]
Shelly Wise commented: Per DEV no code review, therefore no QA review needed for this ticket.
Moved to Stage Ready.
While investigating ticket #807 for Celery cpu usage, it was noted that the CPU for both the fecfile-web-api and fecfile-web-services containers are spiking on startup/restart. The [Kibana 'App Metrics' dashboard](https://logs.fr.cloud.gov/app/dashboards#/view/App-Metrics?_g=(filters:Unable to render embedded object: File ((pause:!t,value:0),time:(from:now-15m,to:now))&_a=(description:'',filters:) not found.(('$state':(store:appState),meta:(alias:!n,disabled:!f,key:query,negate:!f,type:query_string,value:''),query:(query_string:(analyze_wildcard:!t,query:'')))),fullScreenMode:!f,options:(darkTheme:!f),query:(language:kuery,query:''),timeRestore:!f,title:'App%20-%20Metrics',viewMode:view&_a=(description:'',filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,key:query,negate:!f,type:query_string,value:''),query:(query_string:(analyze_wildcard:!t,query:'')))),fullScreenMode:!f,options:(darkTheme:!f),query:(language:kuery,query:''),timeRestore:!f,title:'App%20-%20Metrics',viewMode:view))) can be used to view resource utilization and can be filtered on specific environments (by default it aggregates all environments).
cf app fecfile-web-services
command can be used to see a snapshot of resource utilization on the VMcf ssh fecfile-web-services ps -eo pcpu,pid,user,args | tail -n +2 | sort -k1 -r -n | head -10
command can be used to sort processes by cpu utilization
cf cpu-entitlement fecfile-web-services
command can be used to see our entitlement utilization (based on our memory allocation from cloud.gov)This story is to investigate the cause of the spikes and to see if it is a problem (e.g):
[!image-20240605-174512.png! See image in Jira| /attachments/11343?name=image-20240605-174512.png] See image in Jira See image in Jira
[| ]
DEV NOTES
One possible cause may be the creation of the committee views on startup being expensive: https://github.com/fecgov/fecfile-web-api/blob/3b3d581f76dc21c632239e5fc7c64d4608bad418/bin/run-api.sh#L5
The results of this ticket should be a list of action items pertaining to the cause a potential remedies of the CPU spike.
QA Notes
null
DEV Notes
null
Design
null
FECFILE-172