Closed boukeas closed 1 month ago
@plars @val500 I have noticed that <phase>_start
and <phase>_success
events are emitted (and recorded) even for phases that are subsequently skipped. Is this behaviour intentional, i.e. are there any advantages to doing this? I would argue that these events should only be emitted if the corresponding phase is actually executed.
@plars The attachment error you reproduced is solved with this commit (part of this PR) and mentioned in the description:
Fixes a minor issue in how tarfile is patched for Python 3.8 that allows directories to be included as attachments.
So attaching directories is supported and the issue is fixed but what this PR aims to do is also provide support for surfacing any other attachment errors, in a manner consistent with how this is handled in other phases. More in a separate comment.
@plars When you tried to reproduce the error, you had no indication as a Testflinger user that there was indeed an attachment error. If you polled the Testflinger output, there was nothing there. If you requested the job results, there was no <phase>_status
exit code or any other field to reflect that something failed. And if you were monitoring the events emitted, again there would be no relevant events. These are all mechanisms that a Testflinger user can rely on in order to determine the outcome of a job and its phases but they don't apply to attachment unpacking: you had to go check the agent's log in order to see the error message, which is something that a user cannot do. This is all outlined in the PR description as well.
Of course we could handle attachment unpacking as a special case and do all that (add a result field, emit events and create a runner to generate error output) specifically for attachments but why would we when this is all done for each phase anyway? And, as a bonus, you get what I believe is a very sensible refactoring of the existing phases as well.
Closing this PR as it attempts to resolve multiple issues at once:
Description
Retrieving and unpacking the attachments of a Testflinger job is currently handled within the agent code, before the agent starts going through the phases of that job. This imposes certain limitations with regards to how attachment-related failures are handled:
job_start
event is emitted. If an error occurs during attachment unpacking, there is currently no way to convey that through events.One way to lift all these restrictions simultaneously is to treat the retrieval and unpacking of the job attachments as a separate phase.
Changelog
This PR:
TestflingerJobPhase
abstract base class, with an interface that captures how all phases are supposed to work procedurally.ExternalCommandPhase
abstract base class, derived fromTestflingerJobPhase
, that captures the workings of phases that run a pre-configured external command. All previously existing phases fall into this category.TestflingerJob.run_test_phase
into separate classes derived fromExternalCommandPhase
, each corresponding to a different phase.UnpackPhase
derived fromTestflingerJobPhase
.tarfile
is patched for Python 3.8 that allows directories to be included as attachments.Some points to note while reviewing:
agent/testflinger_agent/job.py
as the refactoring is considerable. It is best to view the file in its entirety.TestflingerAgent
. It has all been moved into the newly introducedUnpackPhase
. Functions/methods likeunpack_attachments
andsecure_filter
are now implemented as methods of theUnpackPhase
, since this is the only phase they are relevant to.wait_for_completion
method or thepost_core
implementation for the allocate phase: these used to be methods ofTestflingerJob
, whereas now they are methods ofAllocatePhase
, which is the only phase they are relevant to.TestflingerJobParameters
named tuple to hold this bundle. For each job, both the job and its phases store a reference to a single object of this class. This avoids duplication of this information across jobs and phases and, most importantly, allows jobs and phases to be loosely coupled, instead of one circularly referencing the other.rundir
is no longer provided as an argument toTestflingerJob.run_test_phase
. This directory is always the same for each job and can be determined when instantiating the job.Resolved issues
Resolves CERTTF-412 and CERTTF-413.
Documentation
N/A
Web service API changes
N/A
Tests