Open guberti opened 1 year ago
Hi @guberti,
We do some of this for HardFault's in our NPU tests by including: https://github.com/apache/tvm/blob/a4840e7de38c5a2000917f2101f3ec4a374bcd39/tests/python/contrib/test_ethosu/reference_system/hard_fault.h
And then referencing it as an include: https://github.com/apache/tvm/blob/a4840e7de38c5a2000917f2101f3ec4a374bcd39/tests/python/contrib/test_ethosu/infra.py#L161
Potentially worth making a more generic version of this for the different handlers and including it by default for the reference system?
Also just to clarify, the simulated core is an M55 not an M7 (see: https://developer.arm.com/Processors/Corstone-300)
Sorry, don't know how I wrote M55 up there - I've corrected it :). That hard_fault.h
file looks like just the kind of thing we'd need to fix this.
Currently, we use the Corstone300 simulator to test our microTVM schedules. The simulator is configured to model the Cortex-M55. However, we do not do any fault handling, so when an exception (e.g.
UsageFault
) is thrown, the simulator (and test) hang forever.To prevent this, it would be nice to configure the simulator to report an exception and terminate when a fault is thrown. Note that this issue is not with the Corstone300 simulator (a real Cortex-M55 would also hang if configured the same way), but rather with our usage of it.
Long explanation
I recently spent three days tracking down a strange bug I encountered while writing microTVM schedules for Arm Cortex-M. When I called the
tensorize
function using the following function, the code ran to completion as expected using the Corstone300 simulator:However, tensorizing with the following code and running it in the Corstone300 simulator made it hang forever without throwing an error:
After some debugging, it turns out that when these chunks of code are compiled with Arm GCC, different memory loading instructions are used (specifically, the load doubleword instruction is used to load kernel memory in the failing case, while two single word insturctions are used in the working case).
As described in section 7.4 of the Cortex-M55 reference manual, this is intended behavior - some, but not all memory instructions throw
UsageFault
when an unaligned access is performed.It would have saved me a ton of debugging time if the Corstone300 simulator (and our tests) reported when an exception was thrown and failed the test (instead of looping forever). Would love to see this change added!
cc @Mousius @areusch @driazati @gigiblender