Open gkc opened 2 years ago
@cconstab @cpswan Can you please grant me access to the required monitoring dashboards to help me investigate this issue
Two possible causes of CPU spike
merged PR to randomise hive expiry check https://github.com/atsign-foundation/at_server/pull/497
Moving the task to next sprint (PR-30) to validate the performance once the changes are deployed.
@cconstab @cpswan Is this issue still occurring in prod? any work to be done in the upcoming sprint related to load spikes?
Will take a look or @cpswan
@murali-shris things are a lot better, but I'm still seeing some hourly spikes, so maybe another scheduled job elsewhere in the secondary?
@murali-shris @cpswan should we move priority up to high for PR43 sprint planning?
@murali-shris @cpswan should we move priority up to high for PR43 sprint planning?
yes @ksanty ..we can revisit whether prod spike still exists
Lead: @murali-shris
Describe the bug There are periodic large load spikes which correspond with scheduled jobs (compaction, scans, ...) which are straining our worker nodes
Expected outcome