Closed bsach64 closed 3 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 70.57%. Comparing base (
00d7cdc
) to head (6298e80
).:exclamation: Current head 6298e80 differs from pull request most recent head 4bfaf6e. Consider uploading reports for the commit 4bfaf6e to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
To simulate the crash, I added a call to abort() before line 2104 criu/cr-dump.c
It looks like we need to add a new fault injection there. See f1714ccce714093170a2616474cfc7b33298c75e as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with --fault <id>
option.
To simulate the crash, I added a call to abort() before line 2104 criu/cr-dump.c
It looks like we need to add a new fault injection there. See f1714cc as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with
--fault <id>
option.
Hello @snorch,
I added a new fault injection for CRIU crashing on dump. Please let me know if this was done correctly.
On adding a test(s) to test/jenkins/criu-fault.sh
, any test that specifies the crfail flag can (should?) be added to the criu-fault.sh
.
There are ~15 tests that specify the crfail
flag should all these tests be added to criu-fault.sh? Is there a way that we can run all tests that specify a specific flag?
To simulate the crash, I added a call to abort() before line 2104 criu/cr-dump.c
It looks like we need to add a new fault injection there. See f1714cc as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with
--fault <id>
option.Hello @Snorch, I added a new fault injection for CRIU crashing on dump. Please let me know if this was done correctly.
Looks correct. But from zdtm.py point of view it may be not so simple, see how __criu_act retries action without fault, we likely don't want our fault to be retry-able by zdtm, so maybe just make it's id >=128 ?
On adding a test(s) to
test/jenkins/criu-fault.sh
, any test that specifies the crfail flag can (should?) be added to thecriu-fault.sh
. There are ~15 tests that specify thecrfail
flag should all these tests be added to criu-fault.sh? Is there a way that we can run all tests that specify a specific flag?
Yeh, it's a good question. Let's just use one random test with crfail
(e.g.: vfork00). I don't see easy alternative to this approach, maybe someone else does...
To simulate the crash, I added a call to abort() before line 2104 criu/cr-dump.c
It looks like we need to add a new fault injection there. See f1714cc as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with
--fault <id>
option.Hello @Snorch, I added a new fault injection for CRIU crashing on dump. Please let me know if this was done correctly.
Looks correct. But from zdtm.py point of view it may be not so simple, see how __criu_act retries action without fault, we likely don't want our fault to be retry-able by zdtm, so maybe just make it's id >=128 ?
I don't think giving the fault injection an id >= 128 is what we want.
If I understand correctly, ids >= 128 refer to non-fatal fault injections (criu/include/fault-injection.h
). Giving this fatal fault injection an id >= 128 causes tests to fail. I think currently zdtm retries all fatal fault injections. I will try to figure out a different way :)
Can you elaborate on why we don't want our fault to be retry-able?
On adding a test(s) to
test/jenkins/criu-fault.sh
, any test that specifies the crfail flag can (should?) be added to thecriu-fault.sh
. There are ~15 tests that specify thecrfail
flag should all these tests be added to criu-fault.sh? Is there a way that we can run all tests that specify a specific flag?Yeh, it's a good question. Let's just use one random test with
crfail
(e.g.: vfork00). I don't see easy alternative to this approach, maybe someone else does...
I have added vfork00
to criu-fault.sh
.
To simulate the crash, I added a call to abort() before line 2104 criu/cr-dump.c
It looks like we need to add a new fault injection there. See f1714cc as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with
--fault <id>
option.Hello @Snorch, I added a new fault injection for CRIU crashing on dump. Please let me know if this was done correctly.
Looks correct. But from zdtm.py point of view it may be not so simple, see how __criu_act retries action without fault, we likely don't want our fault to be retry-able by zdtm, so maybe just make it's id >=128 ?
I don't think giving the fault injection an id >= 128 is what we want. If I understand correctly, ids >= 128 refer to non-fatal fault injections (
criu/include/fault-injection.h
). Giving this fatal fault injection an id >= 128 causes tests to fail.
I may misunderstand something, correct me if I'm wrong, but isn't a test fail exactly the behavior we expect in case of criu crash? In other words: if criu fails to dump/restore crfail
test zdtm reports PASS, if criu crashes zdtm should report FAIL, isn't it? (And in criu-fault.sh we should expect FAIL from zdtm.)
I think currently zdtm retries all fatal fault injections. I will try to figure out a different way :) Can you elaborate on why we don't want our fault to be retry-able?
On adding a test(s) to
test/jenkins/criu-fault.sh
, any test that specifies the crfail flag can (should?) be added to thecriu-fault.sh
. There are ~15 tests that specify thecrfail
flag should all these tests be added to criu-fault.sh? Is there a way that we can run all tests that specify a specific flag?Yeh, it's a good question. Let's just use one random test with
crfail
(e.g.: vfork00). I don't see easy alternative to this approach, maybe someone else does...I have added
vfork00
tocriu-fault.sh
.
To simulate the crash, I added a call to abort() before line 2104 criu/cr-dump.c
It looks like we need to add a new fault injection there. See f1714cc as an example. This way we would be able to add a test for it to test/jenkins/criu-fault.sh with
--fault <id>
option.Hello @Snorch, I added a new fault injection for CRIU crashing on dump. Please let me know if this was done correctly.
Looks correct. But from zdtm.py point of view it may be not so simple, see how __criu_act retries action without fault, we likely don't want our fault to be retry-able by zdtm, so maybe just make it's id >=128 ?
I don't think giving the fault injection an id >= 128 is what we want. If I understand correctly, ids >= 128 refer to non-fatal fault injections (
criu/include/fault-injection.h
). Giving this fatal fault injection an id >= 128 causes tests to fail.I may misunderstand something, correct me if I'm wrong, but isn't a test fail exactly the behavior we expect in case of criu crash? In other words: if criu fails to dump/restore
crfail
test zdtm reports PASS, if criu crashes zdtm should report FAIL, isn't it? (And in criu-fault.sh we should expect FAIL from zdtm.)I think currently zdtm retries all fatal fault injections. I will try to figure out a different way :) Can you elaborate on why we don't want our fault to be retry-able?
On adding a test(s) to
test/jenkins/criu-fault.sh
, any test that specifies the crfail flag can (should?) be added to thecriu-fault.sh
. There are ~15 tests that specify thecrfail
flag should all these tests be added to criu-fault.sh? Is there a way that we can run all tests that specify a specific flag?Yeh, it's a good question. Let's just use one random test with
crfail
(e.g.: vfork00). I don't see easy alternative to this approach, maybe someone else does...I have added
vfork00
tocriu-fault.sh
.
Sorry, the misunderstanding was on my part. You are correct; the tests previously passed as they were re-run without the fault injection. Thank you for all your help @Snorch :)
Hm, test still does not pass with && fail
as though we don't call fail function we still return non-zero exit code and with set -e
it fails the script.
This seem to do the right trick:
if ./test/zdtm.py run -t zdtm/static/vfork00 --fault 136 --report report -f h ; then
fail
fi
Also maybe someone else has better ideas?
A friendly reminder that this PR had no activity for 30 days.
A friendly reminder that this PR had no activity for 30 days.
Can we remove the "stale-pr" label?
A friendly reminder that this PR had no activity for 30 days.
Can we remove the "stale-pr" label?
Removed.
This PR distinguishes
criu dump
crash fromcriu dump
fail in thezdtm.py
script. Now,zdtm
reportscriu dump
crash as a test failure.Very similiar to the work done in this PR (#2140)
But, to detect a rpc crash we see that if the response from the RPC is invalid, then we assume the RPC crashed.
For detecting the crash for the CLI we can check the
Popen.returncode
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncodeTo simulate the crash, I added a call to
abort()
beforeline 2104 criu/cr-dump.c
Fixes: #350