Closed castlemilk closed 4 years ago
I've just encountered the same issue:
kubectl logs -n stackdriver gke-01-prometheus-server-6f87676b88-2mkbp -c sidecar
level=info ts=2019-09-03T10:22:10.885Z caller=main.go:303 msg="Starting Stackdriver Prometheus sidecar" version="(version=, branch=, revision=)"
level=info ts=2019-09-03T10:22:10.885Z caller=main.go:304 build_context="(go=go1.12.9, user=, date=)"
level=info ts=2019-09-03T10:22:10.885Z caller=main.go:305 host_details="(Linux 4.14.127+ #1 SMP Tue Jun 18 18:32:10 PDT 2019 x86_64 gke-01-prometheus-server-6f87676b88-2mkbp (none))"
level=info ts=2019-09-03T10:22:10.885Z caller=main.go:306 fd_limits="(soft=1048576, hard=1048576)"
level=error ts=2019-09-03T10:22:10.890Z caller=main.go:394 msg="Tailing WAL failed" err="open checkpoint: list segment in dir:/data/wal/data/wal/checkpoint.000041: open /data/wal/data/wal/checkpoint.000041: no such file or directory"
Config:
- args:
- --stackdriver.project-id=xxx-xxx-xxx
- --prometheus.wal-directory=/data/wal
- --include={__name__=~".+",job="kubernetes-pods"}
- --include={__name__=~".+",job="gcp-clsi"}
imagePullPolicy: Always
name: sidecar
ports:
- containerPort: 9091
name: sidecar
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: storage-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: gke-01-prometheus-server-token-hdt4z
readOnly: true
Looks like it was changed in this commit: https://github.com/Stackdriver/stackdriver-prometheus-sidecar/commit/d8e80a47d54eb2fdd95e51c0ac65066cf1471350#diff-ebda32def4aa9fd929c691ab34ebcc47R62
This is specifically not working on the 0.5.1 release (so far tested), 0.4.1 is working.
I believe there's an issue relating to how the path for the checkpoint file is being resolved. The culprit is the following:
https://github.com/Stackdriver/stackdriver-prometheus-sidecar/blob/3e6c59f370d26797abbcbb36d9684270c72a0e3b/tail/tail.go#L65
Why are we joining
dir
andcpdir
.as we set something like
--prometheus.wal-directory=/prometheus/wal
which is set fordir
andcpdir
is returned fromtsdb.LastCheckpoint(dir)
, which will return the full path of the checkpoint found. So we end up concatenating to the following:/prometheus/wal
+/prometheus/wal/checkpoint.002016
I'm getting the following errors in this scenario:
Can someone advise if I'm just configuring this incorrectly or not?
The configuration i'm using for the sidecar is as follows