chipsalliance / riscv-dv

Random instruction generator for RISC-V processor verification
Apache License 2.0
1.02k stars 329 forks source link

Continuing on error in run.py #774

Open GregAC opened 3 years ago

GregAC commented 3 years ago

Currently when run.py runs a step if there is an error return code on any commands it runs via run_cmd it immediately exits with sys.exit.

Whilst running a full DV regression on Ibex I'm occasionally seeing issues where spike terminates with an error killing the whole run. Whilst whatever issue that causes spike to terminate with an error needs attention it is still useful to continue running other tests to completion.

Would there be any interest in a continue on error option in run.py to allow this (it'd set exit_on_error to 0 when using run_cmd, may be some extra error handling/reporting required too, need to investigate in more detail)? I'd be happy to put a PR together.

udinator commented 3 years ago

That's an interesting issue - does this happen every time you run a regression? Generally what I see (at least from my end) is that even if a Spike sim errors out, the rest of the regression will continue running without erroring out.

I'd have no objections if you wanted to put together a PR though!

GregAC commented 3 years ago

Looks like I'm seeing a spike assertion error:

spike: ./fesvr/device.cc:44: void device_t::handle_identify(command_t): Assertion `addr % IDENTITY_SIZE == 0' failed.

One of the riscv_pmp_full_random_test iterations.

(I'm not worrying about it for now but one of a few flaky tests to fix up).

Most of the time spike related errors get a spike timeout that doesn't kill the run, it's when spike itself terminates with an error that you get a problem.

udinator commented 3 years ago

huh...that's interesting. you're right, we currently don't gracefully handle the cases where Spike actually terminates early due to an error like the one you showed. I think a solution along the lines of what you proposed earlier would be a good idea.