canonical / testflinger

https://testflinger.readthedocs.io/en/latest/
GNU General Public License v3.0
12 stars 20 forks source link

Log the error and fail gracefully for artifact download issues #321

Closed plars closed 3 months ago

plars commented 3 months ago

Description

If the disk fills up during the job, it's not surprising that things can fail quite badly. But then to finish off the job, we try to save the artifacts, which is also probably going to fail. Let's log as much detail about the failure as we can, salvage whatever we can to push as a result to the server, and clean it up so that we fail more gracefully if this happens.

Resolved issues

We've seen this recently with oemscript provisioned devices when hundreds of them are running at once, all trying to download a 4GB image on an agent host that has about 300GB free. We should certainly add space and spread these out a bit better too, but in the meantime this could help us handle it in the best way possible.

Documentation

N/A

Web service API changes

N/A

Tests

Tested in staging to ensure it works on the normal path (not with a full disk). Added unit tests to also simulate what will happen when saving the artifact fails because we're out of space.