ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.68k stars 2.98k forks source link

STM32 - TIMEOUT issue since CMSIS5 merge #4459

Closed jeromecoutant closed 7 years ago

jeromecoutant commented 7 years ago

Description

Since #4294 merge, we got many issues with STM32 targets.

Tests are still on going, but it seems that:

Toolchain version: All

mbed-cli version: 1.1.1

jeromecoutant commented 7 years ago

ex :

mbedgt: greentea test automation tool ver. 1.2.5 mbedgt: test specification file 'C:\github\mbed\BUILD\tests\NUCLEO_F103RB\GCC_ARM\test_spec.json' (specified with --test-spec option) mbedgt: using 'C:\github\mbed\BUILD\tests\NUCLEO_F103RB\GCC_ARM\test_spec.json' from current directory! mbedgt: detecting connected mbed-enabled devices... mbedgt: detected 1 device +---------------+----------------------+-------------+-------------+--------------------------+ | platform_name | platform_name_unique | serial_port | mount_point | target_id | +---------------+----------------------+-------------+-------------+--------------------------+ | NUCLEO_F103RB | NUCLEO_F103RB[0] | COM26 | D: | 07000221D12A3E0FEE6EA832 | +---------------+----------------------+-------------+-------------+--------------------------+ mbedgt: processing target 'NUCLEO_F103RB' toolchain 'GCC_ARM' compatible platforms... (note: switch set to --parallel 1) +---------------+----------------------+-------------+-------------+--------------------------+ | platform_name | platform_name_unique | serial_port | mount_point | target_id | +---------------+----------------------+-------------+-------------+--------------------------+ | NUCLEO_F103RB | NUCLEO_F103RB[0] | COM26:9600 | D: | 07000221D12A3E0FEE6EA832 | +---------------+----------------------+-------------+-------------+--------------------------+ mbedgt: test case filter (specified with -n option) test filtered in 'tests-mbed_drivers-echo' mbedgt: running 1 test for platform 'NUCLEO_F103RB' and toolchain 'GCC_ARM' use 1 instance of execution threads for testing mbedgt: checking for 'host_tests' directory above image directory structure found 'host_tests' directory in: 'TESTS\host_tests' mbedgt: selecting test case observer... calling mbedhtrun: mbedhtrun -m NUCLEO_F103RB -p COM26:9600 -f "BUILD/tests/NUCLEO_F103RB/GCC_ARM/TESTS/mbed_drivers/echo/echo.bin" -e "TESTS\host_tests" -d D: -C 4 -c shell -t 07000221D12A3E0FEE6EA832 mbedgt: mbed-host-test-runner: started [1496743800.38][HTST][INF] host test executor ver. 1.1.8 [1496743800.38][HTST][INF] copy image onto target... [1496743800.38][COPY][INF] Waiting up to 60 sec for '07000221D12A3E0FEE6EA832' mount point (current is 'D:')... 1 file(s) copied. [1496743808.29][HTST][INF] starting host test process... [1496743809.16][CONN][INF] starting connection process... [1496743809.16][CONN][INF] notify event queue about extra 60 sec timeout for serial port pooling [1496743809.16][CONN][INF] initializing serial port listener... [1496743809.16][PLGN][INF] Waiting up to 60 sec for '07000221D12A3E0FEE6EA832' serial port (current is 'COM26')... [1496743809.19][HTST][INF] setting timeout to: 60 sec [1496743809.73][SERI][INF] serial(port=COM26, baudrate=9600, read_timeout=0.01, write_timeout=5) [1496743809.76][SERI][INF] reset device using 'default' plugin... [1496743810.01][SERI][INF] waiting 1.00 sec after reset [1496743811.01][SERI][INF] wait for it... [1496743811.05][SERI][TXD] mbedmbedmbedmbedmbedmbedmbedmbedmbedmbed [1496743811.05][CONN][INF] sending up to 2 sync packets (specified with --sync=2) [1496743811.05][CONN][INF] sending preamble '29c6f0df-fe5a-43e9-a149-534722c7569f' [1496743811.09][SERI][TXD] {{sync;29c6f0df-fe5a-43e9-a149-534722c7569f}} [1496743812.10][CONN][INF] resending new preamble '7f49e61d-8986-4d7a-b0c8-48fe59ba9511' after 1.00 sec [1496743812.14][SERI][TXD] {{sync;7f49e61d-8986-4d7a-b0c8-48fe59ba9511}} [1496743870.02][HTST][INF] test suite run finished after 60.83 sec... [1496743870.03][CONN][INF] received special even 'host_test_finished' value='True', finishing [1496743870.04][HTST][INF] CONN exited with code: 0 [1496743870.04][HTST][INF] No events in queue [1496743870.04][HTST][INF] stopped consuming events [1496743870.04][HTST][INF] host test result(): None [1496743870.04][HTST][WRN] missing exit event from DUT [1496743870.05][HTST][WRN] missing exit_event_queue event from host test [1496743870.05][HTST][ERR] missing __exit_event_queue event from host test and no result from host test, timeout... [1496743870.05][HTST][INF] calling blocking teardown() [1496743870.05][HTST][INF] teardown() finished [1496743870.05][HTST][INF] {{result;timeout}} mbedgt: checking for GCOV data... mbedgt: mbed-host-test-runner: stopped and returned 'TIMEOUT' mbedgt: test case summary event not found no test case report present, assuming test suite to be a single test case! test suite: tests-mbed_drivers-echo test case: tests-mbed_drivers-echo mbedgt: test on hardware with target id: 07000221D12A3E0FEE6EA832 mbedgt: test suite 'tests-mbed_drivers-echo' ......................................................... TIMEOUT in 70.80 sec test case: 'tests-mbed_drivers-echo' ......................................................... ERROR in 70.80 sec mbedgt: test case summary: 0 passes, 1 failure mbedgt: all tests finished! mbedgt: shuffle seed: 0.5906039481 mbedgt: test suite report: +-----------------------+---------------+-------------------------+---------+--------------------+-------------+ | target | platform_name | test suite | result | elapsed_time (sec) | copy_method | +-----------------------+---------------+-------------------------+---------+--------------------+-------------+ | NUCLEO_F103RB-GCC_ARM | NUCLEO_F103RB | tests-mbed_drivers-echo | TIMEOUT | 70.8 | shell | +-----------------------+---------------+-------------------------+---------+--------------------+-------------+

jeromecoutant commented 7 years ago

@0xc0170 @bcostm @LMESTM @bulislaw

bcostm commented 7 years ago

OS2 tests are also in TIMEOUT for all L4 devices

0xc0170 commented 7 years ago

Thanks for the report, I'll test one of the devices . I test today mbed os 5 test, not mbed 2. will check. What is interesting it is all toolchains and some devices.

All tests also (rtos, or even baremetal for mbed 2 ) ?

LMESTM commented 7 years ago

Yes, also bare metal mbed2 tests are showing the problem. It looks like we end up in the Default_Handler - has there changes in CMSIS concerning interrupts ? vector ?

LMESTM commented 7 years ago

@bulislaw @0xc0170 1 update, as we end up in Default_Handler I suspected an issue with the interrupt vector.

In #4294, there is a change to move all platforms from target specific implementation to CMSIS implementation. https://github.com/ARMmbed/mbed-os/pull/4294/commits/b97ffe8fdce9473138b0b05984658fe5dc7c7713

But those implementations are not the same. I actually moved back to the previous implementation for my test target and I can start my OS2 basic tests again (tested on cortex-M4, NUCLEO_L476RG)

Also all targets did not have the same implementation in the target specific file. Can you explain how CMSIS update is supposed to cope with this difference between target specific implementations to align every target to reference CMSIS one ?

The target specific one seems to be in charge of // Copy and switch to dynamic vectors if the first time called not sure where this is supposed to be done now ...

... to be continued. feedback welcome

LMESTM commented 7 years ago

@bulislaw @0xc0170 I'd need your feedback / help I found out that the copy is now supposed to be done in mbed_cpy_nvic from mbed_boot.c file. But I'm not sure how and when it will actually be called in case of MBED2 test ? also because it is conditionally call only if

if !(defined(FEATURE_UVISOR) && defined(TARGET_UVISOR_SUPPORTED))

so what if TARGET_UVISOR_SUPPORTED is not defined ? Is this supposed to be defined for all targets ?

0xc0170 commented 7 years ago

@LMESTM I am looking at this. One related issue is also : https://github.com/ARMmbed/mbed-os/issues/4486

The nvic copy was moved and have to check if we did not keep it in the mbed 2 code. It might be the cas,e I'll fix it and provide it also and test. I started looking at it yesterday however was having tools issues to setup the test cases to debug the timeouts (were able to reproduce).

0xc0170 commented 7 years ago

@LMESTM Thanks for the description above. Seems like this is the issue, and there might be more. I am currently reviewing all previous vtor reallocations.

To align with all these, we should provide default implementation for vtor reallocation (as it is in mbed 5), and targets that do not support it, should provide own implementation (do not define NVIC_RAM_VECTOR_ADDRESS in case its non cortex-m0). This is up to a target, therefore the startup should invoke this function

Plus if needed, overwrite the default NVIC_Set/GetVectors functions use these 2 macros:

            "CMSIS_VECTAB_VIRTUAL",
            "CMSIS_VECTAB_VIRTUAL_HEADER_FILE=\"cmsis_nvic.h\""

Does this clear the air a bit?

LMESTM commented 7 years ago

@0xc0170 ok thanks.waiting for the outputs of your review?

To align with all these, we should provide default implementation for vtor reallocation (as it is in mbed 5),

So you'll add-up this implementation ? and make it called in case of MBED2 as well ? Even in MBED5, I think this is not called for now because of the UVISOR related compilation switches.

and targets that do not support it, should provide own implementation (do not define NVIC_RAM_VECTOR_ADDRESS in case its non cortex-m0). This is up to a target, therefore mbed sdk init should HAL should invoke own vtor realloc function.

This part is not so clear yet. Maybe you'll provide more details about the default implementation and list of targets that do not support this default implementation (and where the hook will be)

Plus if needed, overwrite the default NVIC_Set/GetVectors functions use these 2 macros: "CMSIS_VECTAB_VIRTUAL", "CMSIS_VECTAB_VIRTUAL_HEADER_FILE=\"cmsis_nvic.h\"" Does this clear the air a bit?

Id' prefer to avoid this if possible.

Do you think all of the above points will be solved in short term ? Or do you plan to revert the CMSIS5 branch in the meantime ?

0xc0170 commented 7 years ago

I got a default implementatin that I send soon, that should fix lot of targets. The rest needs to be investigated, I'll provide some details here so we can find a solution.

(I was debugging failures since yesterday, just found another issue that I'll address separately)

jeromecoutant commented 7 years ago

Thx You should also find how this big issue could pass the CI without any failure...

LMESTM commented 7 years ago

Related PRs:

4511

4506

4503

LMESTM commented 7 years ago

@0xc0170 -today I tested on master after the list of related PRs were merged. MBED2 tests boot ok on NUCLEO_L476RG as I reported last week, but I still fail to boot on NUCLEO_F334R8. If I roll back before CMSIS5 update, this is ok again ...

0xc0170 commented 7 years ago

@0xc0170 -today I tested on master after the list of related PRs were merged. MBED2 tests boot ok on NUCLEO_L476RG as I reported last week, but I still fail to boot on NUCLEO_F334R8.

mbed 2 boot for NUCLEO_F334R8 ?

LMESTM commented 7 years ago

mbed 2 boot for NUCLEO_F334R8 ?

Not only. I also started automated tests on my analogout branch yesterday The CI shield tests run on NUCLEO_F303ZE on our test bench all timed out. I rebased mbed back before #4294 and the tests were then OK. Edit: this was using ARM toolchain.

0xc0170 commented 7 years ago

Quick debug session shows the callstack as:

SystemInit -> HAL TickInit -> Nvic SetVector -> hardfault

As I noticed VTOR points to the flash area (0x0800 0000 address) thus writing to it results in the hard fault). Is that correct?

We are looking at this, and will provide a solution here. The ideal would be not to setup ISR before even other things are setup (sdk init, rtx init, heap/stack set). As I recall TickInit is called in SystemInit because of C++ ctor ? Or is there any other reason? I'll go into git history to refresh my memory.

0xc0170 commented 7 years ago

cc @c1728p9

LMESTM commented 7 years ago

As I recall TickInit is called in SystemInit because of C++ ctor ? Or is there any other reason? I'll go into git history to refresh my memory.

Yes. HAL_TickInit needs to be called before C++ ctors

c1728p9 commented 7 years ago

I did some digging and found that HAL_Init is called twice, both before C++ global constructors are called. HAL_Init is called in both SystemInit and in sdk_init. The call to HAL_Init in SystemInit is what causes many devices to crash. Can this be removed?

c1728p9 commented 7 years ago

Looking at version of the F1 cube library 1.4.0 (V4.1.0 in the source code) there isn't a call to HAL_Init in SystemInit. It looks like the one in mbed is F1 cube V1.5.0 (V4.2.0 in the souce code). Do you know where I can find this version? The latest I can download is 1.4.0. Was the addition of HAL_Init to SystemInit done in version 1.5.0, or are these mbed-os specific changes?

c1728p9 commented 7 years ago

cc @LMESTM @jeromecoutant @bcostm @adustm

c1728p9 commented 7 years ago

Created PR #4543 for this. Let me know if you have any feedback on it

jeromecoutant commented 7 years ago

Hi

It looks like the one in mbed is F1 cube V1.5.0 (V4.2.0 in the souce code). Do you know where I can find this version? The latest I can download is 1.4.0.

Yes, you are right. V1.5.0 is official but not public yet... We have introduced it in advance in MBED as we needed Low Layers drivers which are introduced in this version for this F1 family.

Was the addition of HAL_Init to SystemInit done in version 1.5.0, or are these mbed-os specific changes?

MBED specific change.

jeromecoutant commented 7 years ago

Hi Last status with the master branch:

0xc0170 commented 7 years ago

Thanks @jeromecoutant , I'll retest uARM. Any specific target you tested so I can reproduce quickly?

Doesn't work ? Not reaching main or what is the error?

0xc0170 commented 7 years ago

@jeromecoutant I can't reproduce. uARM for Nucleo L476RG - MBED_A21 - works for me. latest master. Please provide more details, otherwise we can't reproduce anything neither to fix

jeromecoutant commented 7 years ago

ex with the nucleo you have:

| OK | NUCLEO_L476RG | uARM | DTCT_1 | Simple detect test | 0.53 | 10 | 1/1 | | OK | NUCLEO_L476RG | uARM | EXAMPLE_1 | /dev/null | 3.45 | 20 | 1/1 | | OK | NUCLEO_L476RG | uARM | MBED_10 | Hello World | 0.39 | 5 | 1/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_11 | Ticker Int | 60.28 | 30 | 0/1 | | OK | NUCLEO_L476RG | uARM | MBED_12 | C++ | 1.41 | 10 | 1/1 | | FAIL | NUCLEO_L476RG | uARM | MBED_16 | RTC | 10.4 | 20 | 0/1 | | OK | NUCLEO_L476RG | uARM | MBED_2 | stdio | 0.8 | 20 | 1/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_23 | Ticker Int us | 60.27 | 30 | 0/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_24 | Timeout Int us | 60.28 | 30 | 0/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_25 | Time us | 30.36 | 15 | 0/1 | | OK | NUCLEO_L476RG | uARM | MBED_26 | Integer constant division | 1.39 | 20 | 1/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_34 | Ticker Two callbacks | 60.28 | 30 | 0/1 | | IOERR_SERIAL | NUCLEO_L476RG | uARM | MBED_37 | Serial NC RX | 6.43 | 20 | 0/1 | | OK | NUCLEO_L476RG | uARM | MBED_38 | Serial NC TX | 5.94 | 20 | 1/1 | | OK | NUCLEO_L476RG | uARM | MBED_A1 | Basic | 1.37 | 20 | 1/1 | | OK | NUCLEO_L476RG | uARM | MBED_A21 | Call function before main (mbed_main) | 1.41 | 20 | 1/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_A28 | CAN loopback test | 40.37 | 20 | 0/1 | | OK | NUCLEO_L476RG | uARM | MBED_A30 | CAN API | 1.39 | 20 | 1/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_A9 | Serial Echo at 115200 | 40.43 | 20 | 0/1 | | TIMEOUT | NUCLEO_L476RG | uARM | MBED_BUSOUT | BusOut | 60.32 | 30 | 0/1 |

jeromecoutant commented 7 years ago

Hi any updates about uARM issue ? Thx

0xc0170 commented 7 years ago

I'll have a look soon to reproduce the above timeouts.

0xc0170 commented 7 years ago

I believe I found it, @jeromecoutant , this was once discovered : https://github.com/ARMmbed/mbed-os/pull/2160/files (you can read about microlib not having post stack/heap hook) so was in retarged open called. This would explain what I am seeing, nvic is not copied neither mbed sdk init called.

I'll send a patch shortly for review

0xc0170 commented 7 years ago

Done, look at https://github.com/ARMmbed/mbed-os/pull/4671 please

jeromecoutant commented 7 years ago

Non regression tests with last label are back to a good level Thx