ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.68k stars 2.99k forks source link

ARMC6 regression with V8M targets for QSPI test ? #14459

Closed jeromecoutant closed 2 years ago

jeromecoutant commented 3 years ago

Description of defect

Hi

I got an issue with 1 test suite: hal-tests-tests-mbed_hal-qspi This test is OK for:

After more tests, test is failing with DISCO_L562QE/ARMC6 for all mbed-os version ☹ When I look back in my history, I think issue comes in fact with the ARMC6 version: Before mbed-os-6.7.0, I have updated ARM complier from 6.13.1 version to 6.15

So my conclusion is there is a regression for V8M only between v6.13.1 and v6.15 ….

Target(s) affected by this defect ?

DISCO_L562QE

Toolchain(s) (name and version) displaying this defect ?

ARMC6 v6.15

What version of Mbed-os are you using (tag or sha) ?

mbed-os-6.9.0

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

NA

How is this defect reproduced ?

mbed test -m DISCO_L562QE -t ARM -n hal-tests-tests-mbed_hal-qspi

ciarmcom commented 3 years ago

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers. Internal Jira reference: https://jira.arm.com/browse/IOTOSM-3706

0xc0170 commented 3 years ago

So my conclusion is there is a regression for V8M only between v6.13.1 and v6.15 ….

would you be able to test newer compiler with older Mbed OS to confirm?

jeromecoutant commented 3 years ago

would you be able to test newer compiler with older Mbed OS to confirm?

That is what I have done! Now that I have ARMC6 v6.15, test is failing whatever mbed-os version... (and OK with GCC)

jeromecoutant commented 3 years ago

More tests done:

mbed-os ARMCLANG result
master v6.13 OK
master v6.14 OK
master v6.14.1 OK
master v6.15 FAILED
master v6.16 FAILED
MarceloSalazar commented 3 years ago

@ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review AC 6.15 (official in Mbed OS 6.10) doesn't seem to work with this specific QSPI tests

kjbracey commented 3 years ago

To recheck - this is specific to the L5 for you? So the assumption is ARMv8-M is the trigger?

jeromecoutant commented 3 years ago

I thought about ARMv8 as same test is working for all other STM32, and difference with this L5 one, is this is the only one with M33

0xc0170 commented 3 years ago

@jeromecoutant What tests are failing actually? It would be great to have specific test case that is problematic to investigate. We can pass the information then to the compiler team if this is a valid bug.

jeromecoutant commented 3 years ago

see description...

kjbracey commented 3 years ago

Okay, just checking. It's possible that it's to do with v8-M, but not conclusive. This is also the only one running the L5 STM HAL. Could be something there.

I cast a bit of an eye over the STM qspi code looking for anything vaguely undefined (as I find compilers don't tend to fail with valid code), but couldn't see anything.

kjbracey commented 3 years ago

But do you have more specific info on the failure? Which specific test? Got an assert line number? You've only specified the "qspi test suite" as far as I can see.

0xc0170 commented 3 years ago

But do you have more specific info on the failure? Which specific test? Got an assert line number? You've only specified the "qspi test suite" as far as I can see.

see description...

I am looking at qspi hal test. It defines lot of test cases, which one is failing, all , some ? @jeromecoutant I dont see in the report above the failures (logs). It would help to minimize the code to be able to focus on what is going on (one minimal test case that triggers the failures for the recent compiler versions).

jeromecoutant commented 3 years ago
| DISCO_L562QE-ARMC6 | DISCO_L562QE  | hal-tests-tests-mbed_hal-ospi | OK     | 18.26              | default     |
[1620291985.76][CONN][RXD] >>> Running case #2: 'qspi init/free test'...
[1620291985.80][CONN][INF] found KV pair in stream: {{__testcase_start;qspi init/free test}}, queued...
mbedgt: :135::FAIL: Expected 0 Was -1
[1620291990.77][CONN][RXD] :135::FAIL: Expected 0 Was -1
[1620291990.80][CONN][INF] found KV pair in stream: {{__testcase_finish;qspi init/free test;0;1}}, queued...

| DISCO_L562QE-ARMC6 | DISCO_L562QE  | hal-tests-tests-mbed_hal-qspi | FAIL   | 17.36              | default     |
LDong-Arm commented 3 years ago

[1620291990.77][CONN][RXD] :135::FAIL: Expected 0 Was -1

This suggests the failing lines are https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/hal/tests/TESTS/mbed_hal/ospi/main.cpp#L134-L135

where erase() is defined by the test itself: https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/hal/tests/TESTS/mbed_hal/ospi/ospi_test_utils.cpp#L243-L247

Could you try to find out the reason of the erase failure?

jeromecoutant commented 3 years ago

As said in previous comment: More tests done:

mbed-os ARMCLANG result
master v6.14.1 OK
master v6.15 FAILED
LDong-Arm commented 3 years ago

Could you add some debug info to narrow down which line/function in the OSPI driver is failing? From my comment above, the failure should be somewhere in ospi_command_transfer: https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/targets/TARGET_STM/ospi_api.c#L533

We may not have the board to test it ourselves. If you could let us know which line in the driver/this function returns an error, then we may have a better idea of what to focus on. Thanks in advance.

jeromecoutant commented 3 years ago

@LDong-Arm OSPI test is OK, QSPI test is failing at this line:

https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/hal/tests/TESTS/mbed_hal/qspi/qspi_test_utils.cpp#L135

LDong-Arm commented 3 years ago

@jeromecoutant Ah sorry I looked at OSPI previously. Now looking at QSPI.

So the call that failed the assertion is the line above, i.e. https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/hal/tests/TESTS/mbed_hal/qspi/qspi_test_utils.cpp#L134

where qspi_command_transfer() for QSPI for ST is https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/targets/TARGET_STM/qspi_api.c#L869-L870 (OCTOSPI1 is defined in targets/TARGET_STM/TARGET_STM32L5/STM32Cube_FW/CMSIS/stm32l562xx.h).

Given that the last four parameters passed by the test are 0/NULL, the failing code is https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/targets/TARGET_STM/qspi_api.c#L876-L890

So either qspi_prepare_command or HAL_OSPI_Command failed, if I didn't miss something.

Would it be possible to further narrow it down, by adding some traces for example?

jeromecoutant commented 3 years ago

BUILD.zip This is BUILD directory with v6.14 (tets OK) and v6.15 (FAIL)

jeromecoutant commented 3 years ago

Log with ARMCLANG v6.14:

[1620395936.80][CONN][INF] found KV pair in stream: {{__testcase_start;qspi init/free test}}, queued...
[1620395936.82][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395936.82][CONN][RXD] STOS debug   : qspi_free
[1620395936.82][CONN][RXD] STOS debug   : qspi_init mode 0
[1620395936.82][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395936.82][CONN][RXD] STOS debug   : qspi_free
[1620395936.82][CONN][RXD] STOS debug   : qspi_init mode 0
[1620395936.84][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395936.84][CONN][RXD] STOS debug   : qspi_command_transfer tx 0 rx 1 command 0005
[1620395936.84][CONN][RXD] HAL_OSPI_Command 820 status 0
[1620395936.84][CONN][RXD] HAL_OSPI_Command 828 status 0
[1620395936.84][CONN][RXD] STOS debug   : qspi_command_transfer tx 0 rx 0 command 0066
[1620395936.86][CONN][RXD] HAL_OSPI_Command 820 status 0
[1620395936.86][CONN][RXD] HAL_OSPI_Command 828 status 0
[1620395936.86][CONN][RXD] HAL_OSPI_Command 836 status 0
[1620395936.86][CONN][RXD] STOS debug   : qspi_command_transfer tx 0 rx 1 command 0005
[1620395936.86][CONN][RXD] HAL_OSPI_Command 820 status 0
[1620395936.86][CONN][RXD] HAL_OSPI_Command 828 status 0
[1620395936.88][CONN][RXD] STOS debug   : qspi_command_transfer tx 0 rx 0 command 0099
[1620395936.88][CONN][RXD] HAL_OSPI_Command 820 status 0
[1620395936.88][CONN][RXD] HAL_OSPI_Command 828 status 0
[1620395936.88][CONN][RXD] HAL_OSPI_Command 836 status 0

Log with ARMCLANG v6.15:

[1620395905.12][CONN][RXD] >>> Running case #2: 'qspi init/free test'...
[1620395905.16][CONN][INF] found KV pair in stream: {{__testcase_start;qspi init/free test}}, queued...
[1620395905.20][CONN][RXD] STOS debug   : qspi_init mode 0
[1620395905.24][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395905.26][CONN][RXD] STOS debug   : qspi_free
[1620395905.30][CONN][RXD] STOS debug   : qspi_init mode 0
[1620395905.34][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395905.38][CONN][RXD] STOS debug   : qspi_free
[1620395905.40][CONN][RXD] STOS debug   : qspi_init mode 0
[1620395905.46][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395905.48][CONN][RXD] STOS debug   : qspi_free
[1620395905.52][CONN][RXD] STOS debug   : qspi_init mode 0
[1620395905.56][CONN][RXD] STOS debug   : qspi_frequency hz 66000000
[1620395905.62][CONN][RXD] STOS debug   : qspi_command_transfer tx 0 rx 1 command 0005
[1620395905.66][CONN][RXD] HAL_OSPI_Command 820 status 0
[1620395905.68][CONN][RXD] HAL_OSPI_Command 828 status 0
[1620395905.74][CONN][RXD] STOS debug   : qspi_command_transfer tx 0 rx 0 command 0066
[1620395905.78][CONN][RXD] HAL_OSPI_Command 820 status 0
[1620395905.80][CONN][RXD] HAL_OSPI_Command 828 status 0
[1620395910.70][CONN][RXD] HAL_OSPI_Command 836 status 1
[1620395910.74][CONN][RXD] STOS error   : HAL_OSPI_Command error
mbedgt: :135::FAIL: Expected 0 Was -1
[1620395910.76][CONN][RXD] :135::FAIL: Expected 0 Was -1

status variable has changed here: https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/targets/TARGET_STM/TARGET_STM32L5/STM32Cube_FW/STM32L5xx_HAL_Driver/stm32l5xx_hal_ospi.c#L836

LDong-Arm commented 3 years ago

status variable has changed here: https://github.com/ARMmbed/mbed-os/blob/376fda5bf59c37bfcc0615e6a902c9ac74a96e88/targets/TARGET_STM/TARGET_STM32L5/STM32Cube_FW/STM32L5xx_HAL_Driver/stm32l5xx_hal_ospi.c#L836

It seems possible that the operation timed out? If this is the case, either the operation actually took longer than before (performance issue), or the ticker wasn't functioning properly I guess.

jeromecoutant commented 3 years ago

Test is 100% OK with ARMCLANGv6.14, and 100% FAILED with ARMCLANGv6.15

kjbracey commented 3 years ago

I've stared at the generated and source code for OSPI_WaitFlagStateUntilTimeout and OSPI_ConfigCmd a while. Given that you're adding trace to HAL_OSPI_Command without it changing anything, it seemed likely it was one of those subroutines, rather than HAL_OSPI_Command itself.

Can't immediately see anything. The compiler is generating somewhat different code for OSPI_ConfigCmd, but I can't see a gross difference, just some changes in register allocation and block sharing. If it's going wrong, it's doing it subtly.

Loads of HAL code relies on the ticker, so seems unlikely that's at fault here, and OSPI_WaitFlagStateUntilTimeout generated code looks identical.

Any chance a register write delay is being violated? Doing two peripheral writes too close together? Are there any specified setup delays?