bcgov / MFIN-Data-Catalogue

The Finance Data Catalogue enables users to discover data holdings at the BC Ministry of Finance and offers information and functionality that benefits consumers of data for business purposes. The product is built using Drupal and adheres to the Government of BC's Core Administrative and Descriptive etadata Standard.
Other
6 stars 0 forks source link

Crashlooping pods in Silver/ea352d-test and ea352d-dev #527

Closed NicoledeGreef closed 2 months ago

NicoledeGreef commented 3 months ago

Namespace contacts are receiving emails from Platform Services re: "Your action required: Crashlooping pods in Silver/ea352d-test" (for dev as well).

This ticket is for taking a deeper dive on determining the relationship between zookeeper and Solr. When the issue has been looked into, zookeeper had cleaned up any errors so nothing was evident.

kardamk commented 2 months ago

Zookeeper does not have any memory requests and limits defined within Solr or MFIN data catalogue helm charts values. As seen, in the capture Zookeeper keeps restarting after reaching default memory limits. As Apache Solr is dependent upon Zookeeper, it keeps restarting as well.

image

Possible remediation would be to define memory requests and limits for Zookeeper within various namespaces.

kardamk commented 2 months ago

@chrislaick As discussed, I have changed the Zookeeper pod memory limits from 256Mi to 512Mi. Zookeeper and Apache Solr pods would be monitored for any restarts and memory utilization patterns.

chrislaick commented 2 months ago

@kardamk Looking into Sysdig after the change, zookeeper pod has stabilized with memory usage around 400MiB. Amazing to see the restarts were happening every 20 minutes. Let's make the changes to TEST and PROD if not already.

image

kardamk commented 2 months ago

@chrislaick I was monitoring the memory usage for Zookeeper pod, once it had stabilized, I synced the changes to TEST and PROD environments as well.

NicoledeGreef commented 2 months ago

Looks like we can close this one? :)

kardamk commented 2 months ago

Looks like we can close this one? :)

Yes, the issue for crashlooping zookeeper and solr pods has been resolved after the change.

NicoledeGreef commented 2 months ago

Thanks very much @kardamk