Increased traffic targeting the starknet_call method on our k8s pod pushed CPU usage to 100%, leading to request failures and block sync issues. Subsequent restarts of the pod resulted in immediate OOM errors at startup. However, after applying a fresh database, the pod started to sync properly without any OOM issues which suggests that db has been corrupted(?).
Potential database corruption during restarts combined with high CPU load.
Recent Pebble updates
//UPDATE - 06.05.2024
Pod unable to keep up with syncing, resulting in failed requests due to reaching CPU limit.
Actions taken: Added more pods, restarted pod, but no improvement.
Resolution: Removing and replacing the DB resolved the issue.
Next steps: Prioritize investigating and fixing the underlying cause.
Increased traffic targeting the
starknet_call
method on our k8s pod pushed CPU usage to 100%, leading to request failures and block sync issues. Subsequent restarts of the pod resulted in immediate OOM errors at startup. However, after applying a fresh database, the pod started to sync properly without any OOM issues which suggests that db has been corrupted(?).k8s Logs:
Possible Causes:
//UPDATE - 06.05.2024 Pod unable to keep up with syncing, resulting in failed requests due to reaching CPU limit. Actions taken: Added more pods, restarted pod, but no improvement. Resolution: Removing and replacing the DB resolved the issue. Next steps: Prioritize investigating and fixing the underlying cause.
06-05-2024-incident.pdf