Open chokosabe opened 1 month ago
Forgot to add that on the Frontend, I get this error:
"failed to tear down existing cluster"
This is after the check "which passes fine". The error is generated on preview and/or running the pipeline.
This got resolved by clearing out the artifacts and checkpoints on aws and also by deleting all the replicasets for the exsiting workers. I don't know which of these fixed things. It'd be great if the error message pointed out exactly what the app was trying to do when it errored. i.e Which pods or replicasets it was trying to delete that triggered the error
Have an instance running on kubernetes for ~ 10 days. Suddenly getting errors.
panicked at /app/crates/arroyo-worker/src/lib.rs:297:14: called Result::unwrap() on an Err value: Status { code: FailedPrecondition, message: "Cannot handle message for job_vR6Gen2XNs: State machine is inactive", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sat, 01 Jun 2024 19:10:43 GMT", "content-length": "0"} }, source: None } panic.file="/app/crates/arroyo-worker/src/lib.rs" panic.line=297 panic.column=14
This seems to be a reference to this:
https://github.com/ArroyoSystems/arroyo/blob/faa29a546bdb1bbc300ac2f9731c0dcc02b77bbe/crates/arroyo-worker/src/lib.rs#L297
This issue was still there with an Update deploy so could well be the environment thats the issue or something retained in the namespace.