apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.44k stars 3.41k forks source link

[CI] Corstone300 test runner hangs on UsageFault exception #13193

Open guberti opened 1 year ago

guberti commented 1 year ago

Currently, we use the Corstone300 simulator to test our microTVM schedules. The simulator is configured to model the Cortex-M55. However, we do not do any fault handling, so when an exception (e.g. UsageFault) is thrown, the simulator (and test) hang forever.

To prevent this, it would be nice to configure the simulator to report an exception and terminate when a fault is thrown. Note that this issue is not with the Corstone300 simulator (a real Cortex-M55 would also hang if configured the same way), but rather with our usage of it.


Long explanation

I recently spent three days tracking down a strange bug I encountered while writing microTVM schedules for Arm Cortex-M. When I called the tensorize function using the following function, the code ran to completion as expected using the Corstone300 simulator:

int func(int *output, int *tensor, int *kernel) {
  int sum_0;

  int a = tensor[0];
  int b = tensor[1];
  int c = kernel[0];

  sum_0 = __builtin_arm_smlad(a, c, sum_0);
  sum_0 = b + kernel[1] + sum_0;

  output[0] = sum_0;
  return 0;
}

However, tensorizing with the following code and running it in the Corstone300 simulator made it hang forever without throwing an error:

int func(int *output, int *tensor, int *kernel) {
  int sum_0;

  int a = tensor[0];
  int b = tensor[1];
  int c = kernel[0];
  int d = kernel[1];

  sum_0 = __builtin_arm_smlad(a, c, sum_0);
  sum_0 = b + d + sum_0;

  output[0] = sum_0;
  return 0;
}

After some debugging, it turns out that when these chunks of code are compiled with Arm GCC, different memory loading instructions are used (specifically, the load doubleword instruction is used to load kernel memory in the failing case, while two single word insturctions are used in the working case).

As described in section 7.4 of the Cortex-M55 reference manual, this is intended behavior - some, but not all memory instructions throw UsageFault when an unaligned access is performed.

It would have saved me a ton of debugging time if the Corstone300 simulator (and our tests) reported when an exception was thrown and failed the test (instead of looping forever). Would love to see this change added!

cc @Mousius @areusch @driazati @gigiblender

Mousius commented 1 year ago

Hi @guberti,

We do some of this for HardFault's in our NPU tests by including: https://github.com/apache/tvm/blob/a4840e7de38c5a2000917f2101f3ec4a374bcd39/tests/python/contrib/test_ethosu/reference_system/hard_fault.h

And then referencing it as an include: https://github.com/apache/tvm/blob/a4840e7de38c5a2000917f2101f3ec4a374bcd39/tests/python/contrib/test_ethosu/infra.py#L161

Potentially worth making a more generic version of this for the different handlers and including it by default for the reference system?

Also just to clarify, the simulated core is an M55 not an M7 (see: https://developer.arm.com/Processors/Corstone-300)

guberti commented 1 year ago

Sorry, don't know how I wrote M55 up there - I've corrected it :). That hard_fault.h file looks like just the kind of thing we'd need to fix this.