IDR / deployment

Deployment infrastructure for the Image Data Resource
https://idr.openmicroscopy.org/about/deployment.html
BSD 2-Clause "Simplified" License
10 stars 14 forks source link

Set value for effective cache size and use PR #41 for ome.postgresql #424

Open khaledk2 opened 3 months ago

khaledk2 commented 3 months ago

This PR sets a value for the effective_cache_size attribute (75% of the physical machine memory ) and uses https://github.com/ome/ansible-role-postgresql/pull/41

khaledk2 commented 3 months ago

I think the tests have failed because of the requests version 2.32.0 bug. I have modified the workflow to install "requests < 2.32.0"

sbesson commented 3 months ago

Deployed as test122b for testing. Note the configuration file has multiple duplicate configurations

[sbesson@test122b-database ~]$ sudo tail -n 15 /var/lib/pgsql/16/data/postgresql.conf
#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

listen_addresses = '*'

shared_buffers = 8002MB

listen_addresses = '*'

effective_cache_size = 23880MB
shared_buffers = 7960MB
khaledk2 commented 2 months ago

I have updated the Postgres Ansible role (https://github.com/ome/ansible-role-postgresql/pull/41#issuecomment-2195491374), so it will account for this situation and the added variables will not be duplicated by the end of the configuration file.

sbesson commented 2 months ago

Proposing to deploy this as part of test123 /cc @jburel

sbesson commented 2 months ago

Note this has been deployed on test123 and is currently evaluated for inclusion, quite possibly in prod124. As discussed on Monday, the proposed testing plan is for @khaledk2 to run the search engine indexer and compare the total time using a previous release (prod121 or prod122) as the baseline.

Below is the diff between the PostgreSQL configuration on prod122 and on test123

sbesson@Sebastiens-MacBook-Pro-3 Downloads % diff postgresql.conf.prod122 postgresql.conf.test122 
0a1
> #Ansible managed
822a824
> 
825,827c827,828
< shared_buffers = 8002MB
< 
< max_connections = 150
---
> effective_cache_size = 23880MB
> shared_buffers = 7960MB
khaledk2 commented 2 months ago

The effective_cache_size value on prod122 is 24GB

sudo cat /var/lib/pgsql/16/data/postgresql.conf | grep effective_cache_size
effective_cache_size = 24GB

We have changed this value manullay on prod121after upgrading to PostgreSQL 16. https://github.com/ome/ansible-role-postgresql/pull/41#issuecomment-2144877965

I could not find the container which was used to index the data on the idr-prod122.

So, to have the required comparison, I have run the indexer on the idr-testing123 with the effective_cache_size=23880MB. The indexing time is: 7hrs and 31 minutes

I have set the effective_cache_size value to the default one, i.e. 4GBand restarted the database server then run the indexer again. The indexing time is: 12hrs and 51 minutes

sbesson commented 1 month ago

FYI I have cherry-picked 77b1653 and pushed it to the HEAD of origin/master. While we are waiting for this work to be completed, this should prevent regressions when running the deployment playbooks against production environment and ensure the PSQL database has a properly set effective_cache_size across the board