Closed ricardobranco777 closed 5 months ago
LGTM
Maybe we are trying to fix the problem at the wrong level? This is getting pretty complicated which will likely lead to more bugs in the test. In your test runs if you want this to run once you can pass in --retry-count 1
. This avoids the problem at a lower level preventing the need for all of this complexity.
Maybe we are trying to fix the problem at the wrong level? This is getting pretty complicated which will likely lead to more bugs in the test. In your test runs if you want this to run once you can pass in
--retry-count 1
. This avoids the problem at a lower level preventing the need for all of this complexity.
The problem with --retry-count 1
is that it causes false positives for transient issues making tests in openQA fail, having its own retry mechanism thus burning more money.
I can keep the PR simple. It's not that complicated.
Can this specific test be run on it's own and aggregated to the full test suite?
Can this specific test be run on it's own and aggregated to the full test suite?
It would add real complexity on our side.
This change has value on its own for your workloads as it reduces the time the whole test runs when oscap fails, saving money for SUSE and reducing our carbon footprint.
So the main problem we are trying to solve is to not run this test only once? Given that the test relies on a log file to determine whether to run it it seems we can assume that the instance under test is ephemeral?
Currently the biggest problem with this test is that it will always fail. After the first run no matter what the result was if the log file exists the test gets marked pytest.fail
. Instead we want to carry the result of the first test run. Thus it seems like we need to store a state variable such as "SAP_RESULT" (in the system under test). Where this could denote pass, fail, skip. That way instead of checking if the report exists we can check if there's a previous result stored then duplicate that result while skipping the test. I.e. pytest.{skip/pass/fail}
.
So the main problem we are trying to solve is to not run this test only once? Given that the test relies on a log file to determine whether to run it it seems we can assume that the instance under test is ephemeral?
I removed all references to the log file, checking instead for the report. Yes, we asumme the SUT is ephemeral.
Currently the biggest problem with this test is that it will always fail.
Not true after the latest changes. I assume pytest doesn't restart passed tests.
Not true after the latest changes. I assume pytest doesn't restart passed tests.
pytest.fail is called explicitly if the report exists. https://github.com/SUSE-Enceladus/img-proof/pull/380/files#diff-ee3bc32d8d6d9cbcedc8b867e57b8f3f87dc764a02bc42f2132fe0d11c2f6eefR50
Not true after the latest changes. I assume pytest doesn't restart passed tests.
pytest.fail is called explicitly if the report exists. https://github.com/SUSE-Enceladus/img-proof/pull/380/files#diff-ee3bc32d8d6d9cbcedc8b867e57b8f3f87dc764a02bc42f2132fe0d11c2f6eefR50
If the report exists we can safely assume it's a retry from a failed test.
If the report exists we can safely assume it's a retry from a failed test.
What if the first run passes? Then there's no report and it gets re-run?
If the report exists we can safely assume it's a retry from a failed test.
What if the first run passes? Then there's no report and it gets re-run?
If the first run passes the report is generated if SCAP_REPORT is set to a valid path. This tool or pytest isn't supposed to retry the test if it passed.
Do you mean manually run a 2nd time? In this case the report should be removed before running img-proof.
Okay, sorry I was confusing the scenarios. If it passes there's a report but not a second run. So long as there's not a need to run the test suite on the same instance which could be an issue for us. Thus I think this is okay. But I may submit PR to remove this test from the default test suite if it causes us trouble. That would just mean that it would have to be included separately like img-proof test ... test_sles test_sles_hardened
.
Okay, sorry I was confusing the scenarios. If it passes there's a report but not a second run. So long as there's not a need to run the test suite on the same instance which could be an issue for us. Thus I think this is okay. But I may submit PR to remove this test from the default test suite if it causes us trouble. That would just mean that it would have to be included separately like
img-proof test ... test_sles test_sles_hardened
.
I don't see how it could be an issue for your team if you don't need the report... It fails only if the report already exists. AFAICT, you don't consume the report at all.
Yes, not likely a problem.
Yes, not likely a problem.
I nonetheless recommend generating the report on ephemeral instances to avoid having it run for longer when oscap fails.
This PR:
Rationale: We're using expensive VM's to run this particular test due to the workaround for OOM and these retries are burning money.
Related ticket: https://progress.opensuse.org/issues/158613 Verification run: https://openqa.suse.de/tests/13965566
The time for the VR to run was 22m12s instead of 38m42s in the original job. A 42,6% reduction.