While the agent runs through the test phases, it keeps the original job spec as a json file, along with log files for each phase in the root of the agent execution directly. Then at the end of the job, it reads in the pieces it needs and sends the final results to the server before marking the job complete. A recent job we saw caused a problem with this, by doing this at the end of their job to collect all of their own logs into artifacts:
mkdir artifacts
mv * artifacts
Predictably, this crashed the agent because it could no longer find it's own files that it expected to be there:
[24-10-07 09:08:53] ERROR: (agent.py:296)| [Errno 2] No such file or directory: '.../run/d21e3feb-2465-484c-b69e-cb503ef94ce2/testflinger-outcome.json'
Traceback (most recent call last):
File "/.../agent.py", line 270, in process_jobs
exit_code, exit_event, exit_reason = job.run_test_phase(
File "/.../job.py", line 128, in run_test_phase
self._update_phase_results(
File "/.../job.py", line 151, in _update_phase_results
with open(results_file, "r+") as results:
FileNotFoundError: [Errno 2] No such file or directory: '.../run/d21e3feb-2465-484c-b69e-cb503ef94ce2/testflinger-outcome.json'
We could set cwd to a different path, but would make sure to pull artifacts from the right location if we do that. We might also want to consider whether it makes more sense to switch to transmitting results as it runs. That's something we'd also like to look into, but I suspect it might have a few more corner cases lurking that we need to consider for cases where the transmission fails or is interrupted.
While the agent runs through the test phases, it keeps the original job spec as a json file, along with log files for each phase in the root of the agent execution directly. Then at the end of the job, it reads in the pieces it needs and sends the final results to the server before marking the job complete. A recent job we saw caused a problem with this, by doing this at the end of their job to collect all of their own logs into artifacts:
Predictably, this crashed the agent because it could no longer find it's own files that it expected to be there:
We could set cwd to a different path, but would make sure to pull artifacts from the right location if we do that. We might also want to consider whether it makes more sense to switch to transmitting results as it runs. That's something we'd also like to look into, but I suspect it might have a few more corner cases lurking that we need to consider for cases where the transmission fails or is interrupted.