bcgov / digital-journeys

PSA Forms System
https://bcgov.github.io/digital-journeys/
Apache License 2.0
8 stars 7 forks source link

Components resource allocation (cpu and memory) in deployment files #823

Open iman-jamali-fw opened 1 year ago

iman-jamali-fw commented 1 year ago

Resource allocation of different components needs to be set in their respective deployment files. Currently, these allocations are spread over the deployment file, passed as arguments during the deployment (in GitHub Actions workflow), or set through the OpenShift web console. This would make it hard to manage across environments.

AC:

warrenchristian1telus commented 1 year ago

I'm not sure that this will be the best approach. We may be able to get away with matching the environments this way, but in theory dev shouldn't have as much traffic, and therefore need as many resources as prod. We should be reviewing use to determine the scaling requirements for each namespace, to avoid platform overuse.

We should also be minimizing resource use and horizontally scaling if/when required. This means that rather than allocate all resources initially, we would need to reserve enough resource space to scale up (even if that means requesting more).

I recommend that we consult with AOT on the new deployment method, and if/how they integrate resource allocation and/or scaling.

We are looking at implementing Argo CD for Moodle and the other PHP projects. It may be worth considering to aid with resource management here as well. Especially as we will need to migrate everything for the new version anyway.

It uses a separate repo to manage deployment configs. It should give us even better control over deployments. Hopefully with less effort too.

If we want to go the Argo CD route, we should probably see if AOT and/or Dev Exchange can assist with the implementation to speed things up. I know Kunal hasn't used it yet either, so we're not yet sure what level of effort we would be looking at. Just that it is the recommended method.

Abuchana commented 2 months ago

@warrenchristian1telus, is this the same work at Epic #1721 and ticket #1728? Could they be consolidated? Are there any other tickets that could be put under this Epic?

warrenchristian1telus commented 2 months ago

@Abuchana They are all related, but I think we may need to break some things out, rather than consolidate. My recommendation would be to change the Epic from "PVC Storage Capacity Issue" to something more like ""Cluster Right-Sizing" as the PVC issue is just one of several factors impeding our ability to scale-up to meet demand.

I think this issue will ne needed to address the associated code changes in the repo to handle deployments with appropriate scaling, based on the investigation results from #1728 and scaling decisions / approvals from there.

We will need to determine a viable way to manage cluster size, along with projections, adjustments, etc. and getting those settings from concept (Excel, etc.) to code, using GitHub Actions to deploy and update the cluster in a manner that will allow us to manage cluster resources and scaling strategies as well as dependencies and code deployments.