canonical / testflinger

https://testflinger.readthedocs.io/en/latest/
GNU General Public License v3.0
12 stars 20 forks source link

[CERTTF-412][CERTTF-413] refactor: treat attachment unpacking as a separate phase #365

Closed boukeas closed 1 month ago

boukeas commented 1 month ago

Description

Retrieving and unpacking the attachments of a Testflinger job is currently handled within the agent code, before the agent starts going through the phases of that job. This imposes certain limitations with regards to how attachment-related failures are handled:

One way to lift all these restrictions simultaneously is to treat the retrieval and unpacking of the job attachments as a separate phase.

Changelog

This PR:

Some points to note while reviewing:

Resolved issues

Resolves CERTTF-412 and CERTTF-413.

Documentation

N/A

Web service API changes

N/A

Tests

boukeas commented 1 month ago

@plars @val500 I have noticed that <phase>_start and <phase>_success events are emitted (and recorded) even for phases that are subsequently skipped. Is this behaviour intentional, i.e. are there any advantages to doing this? I would argue that these events should only be emitted if the corresponding phase is actually executed.

boukeas commented 1 month ago

@plars The attachment error you reproduced is solved with this commit (part of this PR) and mentioned in the description:

Fixes a minor issue in how tarfile is patched for Python 3.8 that allows directories to be included as attachments.

So attaching directories is supported and the issue is fixed but what this PR aims to do is also provide support for surfacing any other attachment errors, in a manner consistent with how this is handled in other phases. More in a separate comment.

boukeas commented 1 month ago

@plars When you tried to reproduce the error, you had no indication as a Testflinger user that there was indeed an attachment error. If you polled the Testflinger output, there was nothing there. If you requested the job results, there was no <phase>_status exit code or any other field to reflect that something failed. And if you were monitoring the events emitted, again there would be no relevant events. These are all mechanisms that a Testflinger user can rely on in order to determine the outcome of a job and its phases but they don't apply to attachment unpacking: you had to go check the agent's log in order to see the error message, which is something that a user cannot do. This is all outlined in the PR description as well.

Of course we could handle attachment unpacking as a special case and do all that (add a result field, emit events and create a runner to generate error output) specifically for attachments but why would we when this is all done for each phase anyway? And, as a bonus, you get what I believe is a very sensible refactoring of the existing phases as well.

boukeas commented 1 month ago

Closing this PR as it attempts to resolve multiple issues at once: