Open delroth opened 5 months ago
Another one:
Feb 13 19:40:00 rhea scale[615235]: Work summary:
Feb 13 19:40:00 rhea scale[615235]: System("aarch64-linux") BigParallel = 61
Feb 13 19:40:00 rhea scale[615235]: System("aarch64-linux") Small = 34677
Feb 13 19:40:00 rhea scale[615235]: System("x86_64-linux") BigParallel = 2
Feb 13 19:40:00 rhea scale[615235]: System("x86_64-linux") Small = 1846
Feb 13 19:40:01 rhea scale[615235]: Creating: HardwarePlan {
Feb 13 19:40:01 rhea scale[615235]: bid: 2.0,
Feb 13 19:40:01 rhea scale[615235]: plan: "c3.large.arm64",
Feb 13 19:40:01 rhea scale[615235]: netboot_url: "https://netboot.nixos.org/dispatch/hydra/hydra.nixos.org/equinix-metal-builders/main/c3-large-arm--big-parallel",
Feb 13 19:40:01 rhea scale[615235]: }
Feb 13 19:40:01 rhea scale[615235]: Error: failed to parse json, here's the raw content: Object {
Feb 13 19:40:01 rhea scale[615235]: "errors": Array [
Feb 13 19:40:01 rhea scale[615235]: String("There aren't available servers at any facility"),
Feb 13 19:40:01 rhea scale[615235]: ],
Feb 13 19:40:01 rhea scale[615235]: }
Feb 13 19:40:01 rhea scale[615235]: Caused by:
Feb 13 19:40:01 rhea scale[615235]: missing field `hostname` at line 1 column 61
Feb 13 19:40:01 rhea scale[615235]: Location:
Feb 13 19:40:01 rhea scale[615235]: src/device.rs:92:10
Feb 13 19:40:01 rhea systemd[1]: hydra-scale-equinix-metal.service: Main process exited, code=exited, status=1/FAILURE
This is causing the service to fail with an exit code that can't be distinguished from a "real" error - when really this is more of an expected condition. It should be properly recognized and changed to either a success code or an exit code that we can filter in our monitoring for non-critical failures.