Stackdriver / stackdriver-prometheus-sidecar

A sidecar for the Prometheus server that can send metrics to Stackdriver.
https://cloud.google.com/monitoring/kubernetes-engine/prometheus
Apache License 2.0
120 stars 43 forks source link

stackdriver-prometheus-sidecar crash after redeploy pod #217

Open thomasinspectorio opened 4 years ago

thomasinspectorio commented 4 years ago

Hi there, we setup this infra in the google k8s cluster. But when we redeploy the prometheus pod, firstly we got: " evel=info ts=2020-01-19T02:13:53.643Z caller=queue_manager.go:221 component=queue_manager msg="Stopping remote storage..." level=warn ts=2020-01-19T02:13:53.823Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = Field timeSeries[0].points[0].interval.end_time had an invalid value of \"2020-01-17T05:12:28.292-08:00\": Data points cannot be written more than 25h10s in the past." level=warn ts=2020-01-19T02:13:53.946Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = Field timeSeries[0].points[0].interval.end_time had an invalid value of \"2020-01-17T05:13:28.292-08:00\": Data points cannot be written more than 25h10s in the past." level=info ts=2020-01-19T02:13:53.946Z caller=queue_manager.go:229 component=queue_manager msg="Remote storage stopped." level=error ts=2020-01-19T02:13:53.946Z caller=main.go:598 err="corruption after 2210365440 bytes: unexpected non-zero byte in padded page" level=info ts=2020-01-19T02:13:53.946Z caller=main.go:600 msg="See you next time!" "

And then, after container was auto-restarted, it complains as: level=info ts=2020-01-19T02:18:06.189Z caller=queue_manager.go:229 component=queue_manager msg="Remote storage stopped." level=error ts=2020-01-19T02:18:06.189Z caller=main.go:598 err="corruption after 798916608 bytes: read first header byte: open next segment: next segment 3254 too high, expected 3258" level=info ts=2020-01-19T02:18:06.189Z caller=main.go:600 msg="See you next time!"

Could you explain this, and how we can recover this manually, thank you!