Closed offirko closed 5 years ago
@jeromecoutant , @adustm : I'd appreciate your inputs Thanks.
Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-990
@TuomoHautamaki - my analysis currently is that after several successful program commands to QSPI flash, a new program command fails on Write Enable. All further program/read/erase commands fail on HAL_BUSY. We need ST and HAL support on this case
@ARMmbed/mbed-os-maintainers - Please assign this issue to STM people
@VVESTM , @jeromecoutant , @adustm : hal qspi get stuck at certain stage of the test at:
Eventually the 5[sec] timeout expires and error state is set:
The QSPI hal then is stuck, and no read/program/erase commands can be made until the device is reset !
cc @ARMmbed/team-st-mcd
There is also something related to the toolchain. Does someone knowing IAR can see what can be the issue ? Can it be a memory corruption ? For information, the problem always occurs at the same place. If we rename variable or change name_a to name_A, the problem moves or "disappear"... Same if we remove optimizations in compiler options.
Regarding optimizations, I made a test in develop.json file. We do not see the problem if we remove optimizations on C++ parts : (-On option instead of -Oh) "IAR": { "common": [ "--no_wrap_diagnostics", "-e", "--diag_suppress=Pa050,Pa084,Pa093,Pa082", "--enable_restrict", "-DMBED_TRAP_ERRORS_ENABLED=1"], "asm": [], "c": ["--vla", "--diag_suppress=Pe546", "-Oh"], "cxx": ["--guard_calls", "--no_static_destruction", "-On"], "ld": ["--skip_dynamic_initialization", "--threaded_lib"] } Does it means that problem can be on C++ part ?
@kjbracey-arm @pan- Could you have a look on questions we have around C++ and IAR ? Thx
One more point. On @lmestm side, the test is passed. The difference is the IAR version : Test passed : IAR ELF Linker V8.32.2.178/W32 for ARM (EWARM-CD-8322-19423.exe) Test failing : IAR ELF Linker V8.32.3.193/W32 for ARM (EWARM-CD-8323-20228.exe)
I've noticed there's a known issue for this device in IAR: EWARM-5402, EW26024] Missing FIFO definition for register SPI1->CR2 in the SVD file for ST STM32F746
http://supp.iar.com/FilesPublic/UPDINFO/013240/arm/doc/infocenter/ewarm.ENU.html
@VVESTM - please note the problem is reproduced on my env using: IAR ELF Linker V8.32.1.169/W32 for ARM . Also, I've used "none optimization" cxx setup: "cxx": ["--guard_calls", "--no_static_destruction", "-On"],
And with a bit of code variation, reproduced the problem, this time when trying to set "name_b"
CC: @screamerbg
@ARMmbed/mbed-os-test @ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers
Fyi: https://github.com/ARMmbed/mbed-os/issues/10049#issuecomment-475669701
@VVESTM - Disabling Data Cache with a call to: SCB_DisableDCache() at begining of the test case resolves the problem. (rest of the setup is default)
(could it after all be related to: https://github.com/ARMmbed/mbed-os/issues/9934#issuecomment-472454548 ?)
Although the STM32F7 is vulnerable to cache issues that other boards don't see, I don't believe there's any direct reason for this interface to be vulnerable. It's not being used as a bus-mastering interface like Ethernet, it just has a FIFO you access as programmed memory/mapped I/O, right? Should be no more problematic than the UART. (On the other hand #9934 is quite likely a cache issue).
So the optimisation and cache effects smell to me like a timing issue - maybe you're just slowing it down.
Alternatively, it could be that the cache change is a red-herring, and that it's just the act of inserting the call that moves code around again. :/
It's possible there's a compiler bug, or some code triggering undefined behaviour only in this compiler, but we'd need to pin down a bit closer what's actually going wrong.
There must be one initial transfer that times out - for that transfer we'd want to see how the peripheral had been programmed. Did we program incorrect values? If so, where did those incorrect values come from? Is the hardware signalling something that we're missing? We're waiting for the TC flag - is it signalling TE?
If there ever is a timeout, as was pointed out above, the state gets locked into "error", so it never works again. Is that reasonable? Is this supposed to be a reliable interface?
@VVESTM We see that this issue is reproducible but also is fragile, meaning small changes to the test, like adding prints, or playing with the cache, will "fix" the problem. We need your help in the investigation of the root cause why the QSPI get stuck.
@dannybenor, I am working on this issue. I come back when I have news.
ST_INTERNAL_REF 64387
Description
Following https://jira.arm.com/browse/IOTSTOR-798 tickect
When running storage tests on DISCO_F746NG with IAR8 it fails on test: features-storage-tests-kvstore-static_tests
Same board and test pass ok on IAR7 , as well as on GCC_ARM and ARM.
The test fails in this line : https://github.com/ARMmbed/mbed-os/blob/master/features/storage/TESTS/kvstore/static_tests/main.cpp#L296
When drilling down the failure is on sending write_enable to QSPI Flash, which eventually fails on timeout: https://github.com/ARMmbed/mbed-os/blob/84e4decad045397b7b28e9ba228df64ff3ffbaec/targets/TARGET_STM/qspi_api.c#L301
Data can not be written afterward to the device… until reset.
The test uses kvstore file system to add key/value pairs which hold the values: “name_a”, “name_b”, “name_c”,…,”name_z”
For some strange reason, the combination of “name_o” followed by “name_p” causes the bug. Even if we skip all the previous entries and only set “name_o” followed by “name_p” it fails.
Issue request type