ARM-software / CMSIS_5

CMSIS Version 5 Development Repository
http://arm-software.github.io/CMSIS_5/index.html
Apache License 2.0
1.34k stars 1.08k forks source link

Endless Loop in SCB_DisableDCache () with iMX-RT #620

Closed Masmiseim36 closed 5 years ago

Masmiseim36 commented 5 years ago

Hello,

with commit f46ec37262bc4fd82b6c91d24198cc3b4a499fe3 some ‘register’- qualifier are removed from the code.

With this changes on NXP iMX-RT Controllers the Code execution enters an endless-loop when trying to disable the Data-Cache (--> calling the SCB_DisableDCache () function from the CMSIS Library) This problem is only valid when using memory, which is cached. When using the TCM only everything runs fine. Same with enabling the optimization – with optimization everything is fine. I crosschecked with STM32F7 and Kinetis KV58 to check if they have the same behavior. They work as intended. It looks like it is an iMX-RT exclusive problem.

The problem seems also to be valid only with GCC based Development Environment like McuXpresso, Rowley Crossworks and Segger EmbeddedStudio.

NXP has avoided the problem by using a quite old version of CMSIS in their SDK (commit 3960f196f93e695d6b8144328783e2db29d288fd).

It is curious that this problem seems to occur on iMXRT controller only, but not on others with the identical core. Could this be related to higher clock-speed – the iMXRT family has the highest M7-Clock I´m aware of – of cache-size – it is bigger than on the controllers I´ve compared with. Putting back the ‘register’- qualifier solves the problem for iMX-RT

Best Regards

Masmiseim36 commented 5 years ago

Little update: The cache size shouldn´t be the reason as the iMXRT-1015 is also ending in an endless loop. This 1015 has also only 16 Kbyte Data and Instruction Cache like the STM32F7.

Sample-Code for Rowley Crossworks: CacheTest.zip

ghost commented 5 years ago

Hello Masmiseim36, can you please provide the Linker Description file or the the memory layout for your Sample-Code (CacheTest.zip). I am trying to reproduce the problem using GCC.

Masmiseim36 commented 5 years ago

Hello GuentherMartin,

thanks for checking this issue. Crossworks has an own XML-based Linker description which they translate automatically to a GCC-Linker File during building. I used the default one, which comes with the Development environment. You can download an evaluation Version on their homepage: https://rowley.co.uk/ However, I added the generated Linker-Script to my post, which should work with the 1052-EVK-Board

In addition, I´ve added a sample project with MCUXpresso. Just Single-Step through the project to face the issue.

Thanks and best regards

evkbimxrt1050_lpuart_edma_transfer.7z.zip iMXRT1052 CacheTest.zip

ghost commented 5 years ago

Hello,

I ported example .\SDK_2.6.1_EVKB-IMXRT1050\boards\evkbimxrt1050\driver_examples\cache\armgcc to uVision and run the example without problems. It worked for GNU Tools ARM Embedded "8 2019-q3-update" and "7 2018-q2-update" Optimization Level 0 and Optimization Level 3. I used MIMXRT1052xxxxx_sdram as configuration.

Currently I do not have an explanation why SCB_DisableDCache() does not work for you. The only explanation I have is that GCC with optimization level 0 keeps the local variables in memory and not in registers and loads/stores them after every modification. This would explain why 'register' keyword solved the problem.

Did you try a higher optimization level?

Masmiseim36 commented 5 years ago

Hello Martin,

I haven’t used uVision for a long time and wasn’t aware that it also supports gcc. I failed to port my demonstration-project to uVision with gcc. With ARMCC I can get it running (without facing my initial problem). The gcc-Version seems to be a bit more difficult. Could you share your test-project, so I can take a look? Yes, the problem is only valid without any optimization level, like I wrote in my first post. Any optimization solves the Problem, but complicates debuggability. Without optimization is the default on may gcc-based-Embedded-Development-Environments. Buy the way, CLANG seems also to be OK. It looks like a pure gcc problem.

Best regards

Markus

ghost commented 5 years ago

Hello Markus,

Please find attached two uVision projects , one using AC5 and the other using GCC. The example is based on the SDK_2.6.1_EVKB-IMXRT1050 cache example from NXP. I used:

Please be aware that the cache disable functions are currently under discussion. Maybe they need to be rewritten in assembler.

Test_iMXRT1052.AC5.zip Test_iMXRT1052.GCC.zip

Masmiseim36 commented 5 years ago

Hello GuentherMartin,

after a quick look. You are running the code out of TCM, which is always fine. The Problem occurs only on memory-areas, which are cached, like the OCRAM, SDRAM or flexSPI. I try to get a closer look later.

Best Regards

Markus

Masmiseim36 commented 5 years ago

Hello GuentherMartin,

sorry for the much to long delay. I forgot it after a long stay abroad I altered your example to run from OCRAM and updated to a newer Version of gcc. I created a repository for the example: https://github.com/Masmiseim36/CMSIStest You should face the problem immediately after letting the code freee run.

Best regards

Markus

ghost commented 5 years ago

Hello Markus, I was able to reproduce the problem with your example. What I found out is that GCC stores the local variables from the Cache functions on the stack if optimization level 0 is used. This causes the problem/endless loop you see. If a higher optimization level is used than the local variables are hold in registers and the cache functions work fine. I am sorry to say but currently I do not have a solution for this with using pure 'C'.

Martin

Masmiseim36 commented 5 years ago

Hello Martin,

a solution could be to add register-type-qualifier to the definition of the variable.

This solved the problem for me.

Best Regards

Markus

ghost commented 5 years ago

Hello Markus,

I will discuss this with Jonatan.

Martin

Masmiseim36 commented 5 years ago

Hello Martin,

Just wanted to add, that the initial suggestion of issue #345 is quite good

#if (defined __cplusplus) && ((__cplusplus - 0) >= 201703L)
  #define __REGISTER
#else
  #define __REGISTER register
#endif

This would make the register-keyword usable for C and older C++ Code only.

Best regards

Markus

zejiang0jason commented 5 years ago

Hi @GuentherMartin ,

The register attribute of SCB_DisableDCache's local variable is removed previously https://github.com/ARM-software/CMSIS_5/commit/f46ec37262bc4fd82b6c91d24198cc3b4a499fe3#diff-32944228ef7ad5e349029b63f1e147c4.

The register attribute is necessary, I suggest adding __REGISTER to this local variables.

JonatanAntoni commented 5 years ago

Hi all,

thanks for putting all this different views together. I have discuss this issue with our compiler experts as well. We came to the conclusion that adding register keyword doesn't actually solve the issue but only hides it.

Nowadays compilers don't give us any guarantee on fulfilling the hard requirements of such a function. Even the register keyword doesn't force the compiler not to put these variables to the stack. Hence the only solution that would really work safely is implementing this function in assembly language.

In context of portable code we see many drawbacks of assembly language. The strategic goal in CMSIS is to move entirely to C, or even C++ one day if possible. We are convinced that this kind of issue should be addressed by modern compilers rather than maintaining clumsy workarounds in libraries.

Ultimately, we decided not to move back (to either register keyword or even assembly language) and rather consider GCC at O0 as broken for the time being.

Markus, please consider using O1 instead. I would be more than happy to receive your support in convincing compiler devs to provide us necessary features to antiquate assembly language in bare metal software.

Cheers, Jonatan

Masmiseim36 commented 5 years ago

Hello Jonathan,

I understand your argumentation. However, using O1 is not a real alternative as it reduces debugability. Og is a bit better here but still not as good as O0. My current solution is to add the register keyword locally. Nevertheless, I agree it is not a perfect solution, if I can provide anything to help arguing with the compiler devs, please don´t hesitate to ask.

Best regards

Markus

JonatanAntoni commented 5 years ago

Markus,

the thing with the compiler devs is that the more people requesting proper support for bare metal specifics the higher the chance that these get addressed. So if you are really dependent on O0 (on whatever compiler) and face issues that might force you down to assembly please raise this with the according compiler teams.

Thanks, Jonatan