NixOS / infra

NixOS configurations for nixos.org and its servers
MIT License
209 stars 92 forks source link

Prometheus <-> Prometheus Packet service discovery flaky #329

Closed delroth closed 4 months ago

delroth commented 5 months ago

Likely due to systemd hardening blocking filesystem access, since the file is world-readable (and in fact: world-writable?!?!)

Jan 13 22:00:07 eris prometheus[1315182]: ts=2024-01-13T22:00:07.123Z caller=file.go:343 level=error component="discovery manager scrape" discovery=file config=packet_nodes msg="Error reading file" path=/var/lib/packet-sd/packet-sd.json err="open /var/lib/packet-sd/packet-sd.json: permission denied"
Jan 13 22:00:37 eris prometheus[1315182]: ts=2024-01-13T22:00:37.123Z caller=file.go:343 level=error component="discovery manager scrape" discovery=file config=packet_nodes msg="Error reading file" path=/var/lib/packet-sd/packet-sd.json err="open /var/lib/packet-sd/packet-sd.json: permission denied"
Jan 13 22:01:37 eris prometheus[1315182]: ts=2024-01-13T22:01:37.121Z caller=file.go:343 level=error component="discovery manager scrape" discovery=file config=packet_nodes msg="Error reading file" path=/var/lib/packet-sd/packet-sd.json err="open /var/lib/packet-sd/packet-sd.json: permission denied"
Jan 13 22:04:07 eris prometheus[1315182]: ts=2024-01-13T22:04:07.124Z caller=file.go:343 level=error component="discovery manager scrape" discovery=file config=packet_nodes msg="Error reading file" path=/var/lib/packet-sd/packet-sd.json err="open /var/lib/packet-sd/packet-sd.json: permission denied"
mweinelt commented 5 months ago

I don't see a hardening option enabled that should prevent access. Nothing of the following worked:

But I think it must be related to the runtime environment, since sudo -u prometheus cat /var/lib/packet-sd/packet-sd.json works.

Then I noticed that some processes seem to be able to read the file, and some don't.

[pid 1455844] openat(AT_FDCWD, "/var/lib/packet-sd/packet-sd.json", O_RDONLY|O_CLOEXEC) = 238
[pid 1455845] openat(AT_FDCWD, "/var/lib/packet-sd/packet-sd.json", O_RDONLY|O_CLOEXEC) = 238
[pid 1455836] openat(AT_FDCWD, "/var/lib/packet-sd/packet-sd.json", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

Very confusing.

delroth commented 5 months ago

Tracked it down to https://github.com/packethost/prometheus-packet-sd/issues/15

delroth commented 5 months ago

Renaming this bug to indicate this is less critical than I originally thought - this probably ends up making Prometheus miss some updates, but it's only a race condition that doesn't always get hit.

mweinelt commented 4 months ago

Tried updating to the patched version, but now it chmods to 0600. I'm confused.

mweinelt commented 4 months ago

The chmod is applied to the outfile, not the tempfile. Ouch.

https://github.com/packethost/prometheus-packet-sd/pull/22 https://github.com/NixOS/nixpkgs/pull/291463