bcgov / TheOrgBook

A public repository of verifiable claims about organizations. A key component of the Verifiable Organization Network.
http://von.pathfinder.gov.bc.ca
Apache License 2.0
78 stars 66 forks source link

Add resource limits to deployments #771

Closed esune closed 5 years ago

esune commented 5 years ago

The new OpenShift policies limit the number of pods using a best-effort resource management policy. Find reasonable resource limits for the deployments, and apply the changes to the templates as well.

esune commented 5 years ago

Only thew following deployments need to be tweaked:

WadeBarnes commented 5 years ago

@esune, we have to be careful with these three pods. I've done a significant amount of resource testing with these pods and have found they all function best when set to best-effort.

solr is very sensitive about it startup timing. If it does not have enough resources to start up quickly you end up with a lot of start errors. Once it's started it's not as sensitive. What this means when you are setting the resources is that you have to set your initial requests higher so Solar can start up quickly and not be delayed by requesting additional resources. This tends to be wasteful since the resources needed for quick startup are far greater then you need when things are idle. If you try to adjust for low resource use at idle (lower requests) Solr has great difficulty starting. For some reason it does not have these issues when set to best effort.

schema-spy has similar startup issues. It needs all of it's resources up front and then, basically, nothing once the schema has been generated.

backup would likely have the least issue with explicit resource allocation, although the limits should be set fairly high. CPU and memory use during backups, verification, and restores can be very high; 1.6 CPUs and 44GB memory for a wallet database verification in test for example. Things will work with lower limits but the speed it greatly affected.

esune commented 5 years ago

Sounds good to me to leave these pods using best-effort resources. Will close the issue as won't fix.