Closed ghost closed 3 years ago
In GitLab by @infinitewarp on Aug 10, 2020, 10:29
changed the description
In GitLab by @infinitewarp on Aug 10, 2020, 10:30
changed the description
In GitLab by @infinitewarp on Aug 12, 2020, 13:52
changed the description
In GitLab by @infinitewarp on Aug 12, 2020, 13:55
changed the description
In GitLab by @infinitewarp on Aug 12, 2020, 13:59
changed the description
In GitLab by @infinitewarp on Aug 13, 2020, 13:30
assigned to @katherine-black
In GitLab by @pakamble on Aug 24, 2020, 10:18
assigned to @pakamble
In GitLab by @katherine-black on Sep 1, 2020, 15:40
mentioned in merge request !86
In GitLab by @pakamble on Sep 9, 2020, 04:34
Multiple iterations of image inspection tests case has run to validate this issue. None of the inspection process has failed.
In GitLab by @infinitewarp on Aug 10, 2020, 10:23
Summary
Sometimes houndigrade raises
FileNotFoundError
unexpectedly because themount
command appears to have gone missing. It is unclear how this is possible sincemount
should always be part of the base image.Steps to Reproduce
Expected Result
Actual Result
Additional context
QE people will at a minimum run regression tests around the inspection process to verify that we did not regress any previously-working functionality.
If we find reliable steps to reproduce the problem, devs discuss with QEs during development to determine if it's possible and reasonable for QE to build new test for that problem.
Is there a way to put the inspection startup in a loop and capture the logs to see if it's possible to recreate and observe an actual error run like this?
What should we do to better handle this if we can't find the cause?
Should we catch this exception, report something interesting to stdout so it's logged, and report the images back to SQS with "error" state so houndigrade can cleanly scale down?
Should we have a new/different message format for the inspection results queue to indicate that houndigrade failed to run but the images are neither error nor inspected?
Should we simply check that
mount
exists and is executable before we call it?Should we think of some new way to back off and retry running the houndigrade task?
CloudWatch logs in our production AWS account indicate a houndigrade run at 2020-08-07T18:49:54.269Z for image
ami-0ff6d47107892dd85
failed with this error, but the next run at 2020-08-07T19:49:25.510Z for imagesami-00ff566775b0b66e1
andami-0ff6d47107892dd85
did not fail.Note that the second run included the same image as the first run.
Log of first run that raised this exception:
Log of subsequent run that did not raise exception: