Open hugeone opened 6 years ago
Hi @hugeone,
Thanks for pointing this out. It will take us some time to review your proposals due to vacation period.
My first guess is that adding volatile
is better portable than mixing in inline assembly. The loop is only for having a certain delay hence efficient code should not be a major concern here. Any further thoughts?
Cheers, Jonatan
Hi @JonatanAntoni , volatile will add some store instructions as well so the delay will have to be recalculated as well
@hugeone, sure. The correct solution depends on the actual accuracy needed. We need to analyze that in detail.
Hi, @hugeone, Your solution solved my AC6[-O0] problem in the CMSIS-DAP_for_STLINK-V3MINI. (Tested the "volatile" only). @JonatanAntoni , hope your improvement asap. Thanks.
Hi @RadioOperator,
unfortunately we don't have time to address this on the short run. Please stick to the local fix that looks most appropriate for you for the time being.
Cheers, Jonatan
Unfortunately adding volatile
introduces overhead instructions that will effect the maximum pin toggling frequency.
Therefore the delay loop should be implemented in assembly.
The updated PIN_DELAY_MACRO should now not be optimized out anymore (tested with ARMClang and GCC).
@RobertRostohar Hi, I tested your new code under MDK+AC6 [-Ofast / -Os / -Oz]. The PIN delay timing is OK. But, the Delayms(uint32_t delay) function in DAP.c: Line-148, not stable. When I call Delayms(100), I got 66ms in [-Os, SW mode], and 33ms in all other cases. I do not know if this needs to improve, or just leave it as-is. Thanks.
Are you sure? I have tried with AC6 and I get the same code generated for the assembly loop regardless of optimization. I get something like this (CPU_CLOCK=180M):
148: void Delayms(uint32_t delay) {
0x14000410 F64E2160 MOVW r1,#0xEA60
149: delay *= ((CPU_CLOCK/1000U) + (DELAY_SLOW_CYCLES-1U)) / DELAY_SLOW_CYCLES;
150: PIN_DELAY_SLOW(delay);
0x14000414 4348 MULS r0,r1,r0
289: __ASM volatile (
0x14000416 1E40 SUBS r0,r0,#1
0x14000418 D1FD BNE 0x14000416
151: }
0x1400041A 4770 BX lr
@RobertRostohar I got almost same as yours. In the 1st line: MOVW r1,#0xEA60, 0xEA60 = 6000 = 180M/1000/3 Does this make my Delayms(100) delays 33ms, not 100ms?
------------------my CPU_CLOCK is 216M----------:
148: void Delayms(uint32_t delay) {
0x08001DAA F6411140 MOVW r1,#0x1940 //0x00011940 = 72000 = 216M/1000/3
0x08001DAE F2C00101 MOVT r1,#0x01
149: delay *= ((CPU_CLOCK/1000U) + (DELAY_SLOW_CYCLES-1U)) / DELAY_SLOW_CYCLES;
150: PIN_DELAY_SLOW(delay);
0x08001DB2 4348 MULS r0,r1,r0
289: __ASM volatile (
0x08001DB4 1E40 SUBS r0,r0,#1
0x08001DB6 D1FD BNE 0x08001DB4
151: }
0x08001DB8 4770 BX lr
Your code looks as expected. The inner loop will take 216k cycles (72000*3) for each millisecond specified with delay parameter. Delayms(100)
should take around 21.6M cycles which should be 100ms when CPU runs at 216MHz.
@RobertRostohar Hi, but my problem is : When I call Delayms(100), I get 33ms delay time actually, not 100ms. I don't know why.
Try to debug it and check also how much time is a single cycle on your hardware. You might have additional latency for opcode fetch ...
@RobertRostohar I have done many test. The Delayms(x) accuracy, related AC6 [-opt] and PIN_DELAY_SLOW base. Delayms(100), I got two data (66ms and 33ms) only.
For PIN_DELAY_SLOW, I got: When Keil set SW clock = 10MHz, PIN_DELAY_SLOW ~ 50ns, SWCLK clock ~8MHz. When Keil set SW clock = 1MHz, PIN_DELAY_SLOW ~ 200ns. SWCLK clock ~2.3MHz.
Could you tell me if this is in specification, or not.
Thanks.
I'm not sure why you are mixing the Delayms
function with pin toggling frequency of SWCLK.
Delayms
function is intended as a helper for the main application when there is no RTOS used (for example controlling the LED blink sequence at startup). It does not effect the frequency of SW/JTAG pin toggling. That is controlled with the PIN_DELAY_SLOW/FAST
macros.
The configured SW clock from the debugger is the maximum frequency allowed. The actual frequency can be lower. Typically the actual frequency is lower due to rounding to nearest clock divider and overhead of pin toggling (affected at higher frequencies).
When Keil set SW clock = 10MHz, PIN_DELAY_SLOW ~ 50ns, SWCLK clock ~8MHz.
Looks OK.
When Keil set SW clock = 1MHz, PIN_DELAY_SLOW ~ 200ns. SWCLK clock ~2.3MHz.
Doesn't look right. It should be: PIN_DELAY_SLOW >= 500ns, SWCLK <= 1MHz
.
You should debug the function DAP_SWJ_Clock
when debugger on the PC sets the SW clock to 1MHz and examine the delay calculated and stored in DAP_Data.clock_delay
. This value should be 36 when using a 216MHz CPU clock.
@RobertRostohar I'm reporting the behaviors of the current full package code, especially included your new improvement on PIN_DELAY_SLOW. My test code is using the original, no any modification on the delay timing parts. I'll list out the testing results for your reference later. Thanks.
@RobertRostohar Here it is: (your new commit for PIN_DELAY_SLOW added in the test code)
The data is measured by an oscilloscope, so just for reference.
Two problem:
I’m not sure how you measure Delayms(100) with an oscilloscope. As already mentioned the Delayms function is not used in the SW/JTAG pin toggling.
Also you should not test optimization -O0 since there will be too much overhead in the pin toggling. Normally you would use a high optimization.
Anyway it seems that you have strange SW/JTAG clocks. Actual clock should always be lower than the specified clock from the PC side.
Either you have an issue in the DAP firmware port or the PC debugger sends the wrong commands.
Are you using MDK and CMSIS-DAP Debugger on the PC side? Are you using the original DAP Firmware project from this GitHub repo? Did you port the DAP Firmware correctly? Did you update the DAP.h to use the updated macros?
@RobertRostohar Hi, My code uses the Delayms(100) to trigger nRESET pin, so I noticed it is not 100ms, anyway, ignore this for no one using it.
SW/JTAG clocks do not make me CMSIS-DAP functions failure, even it's fast more actually in lower speed cases, the DAP user can not know it. If this is a "system" problem, you decide to be improved or not.
The test environment: I'm using Win64, PC, MDK, CMSIS-DAP debugger, Target board is STM32F103 / STM32F407. CMSIS-DAP hardware + Firmware, please refer my repo: CMSIS-DAP_for_STLINK-V3MINI.
My DAP porting, using original files from MDK installation, replaced your new DAP.h file for testing. (I have not update it into my repo). The firmware structures is almost same as LPC-Link-II example.
Thanks.
I have ported your project on a slightly different board which has the same processor and measured similar clock frequencies as you have reported.
It tuns out that you are using a Cortex-M7 processor which has an in-order super-scalar pipeline that means many instructions can be dual-issued.
This effects also the PIN_DELAY_SLOW macro, especially how many cycles are required for one iteration. By default 3 cycles are expected but in case of Cortex-M7 this value should be 1.
You should define DELAY_SLOW_CYCLES as 1 in your project which should result in clocks being lower than the max specified frequency and resolve the issue that you have reported.
You should not use Arm/Keil registered USB Vendor ID (0xC251) in you project! You need to obtain your own.
@RobertRostohar Hi, thanks, I'll test your suggestion. Regarding the USB vendor ID, I do not know how to do. Should I use an auto-generated and pre-defined USB-ID from ST software evaluation kit? If I use any other ID, maybe cause a driver problem to Host program. Btw, my "project" is a non-profit home DIY software only. It's a CMSIS-DAP implement sample as LPC-Link-II.
USB Vendor IDs are unique and registered to a company. They can be obtained from USB.org: https://www.usb.org/getting-vendor-id
@RadioOperator Debuggers detect CMSIS-DAP devices by matching against the string "CMSIS-DAP" in the device's product name or interface name string, not VID/PID combinations.
Some silicon vendors have a program where customers can allocate a product ID and then use the vendor's USB vendor ID. I do not know if ST provides such a program.
@flit Thanks, I can change the VID/PID to ST software kit default value later.
@RobertRostohar Hi, I'm still testing it. Based your code:
__STATIC_FORCEINLINE void PIN_DELAY_SLOW (uint32_t delay) {
__ASM volatile (
".syntax unified\n"
"0:\n\t"
"subs %0,%0,#1\n\t"
"bne 0b\n"
: "+l" (delay) : : "cc"
);
}
After AC6[-Ofast] passed, the generated ASM instructions is unstable, like:
.Ltmp14:
subs r7, r7, #1
bne .Ltmp14
here, r7, not fixed "r7" in different "inline" parts in sw_dp/jtag_dp files, could be r0,r1,r2...... I need it using "r0" only, because R0 is more faster than R1 (a half of the mach-cycles). I'm not a good asm-lang coder, please help me or give me some tips. Thanks.
the generated ASM instructions is unstable
No. This is expected. The compiler decides which register to use when the function is inlined (based on other code and optimizations).
because R0 is more faster than R1
No. Same subs instruction which only uses a different register (R0 or R1 or R7) will take the same time.
I would not recommend going on ASM level and rather leave this to the compiler.
You can explore this on your own but this is out of scope for the CMSIS-DAP project.
@RobertRostohar Thanks and understood. I'm struggling on the maximum link-speed, still not put my 216MHz MCU into the limits.
@RobertRostohar , Hi, finally I got the max SWD clock 18MHz, using configs: DELAY_SLOW_CYCLES = 1; DAP_Data.clock_delay = 1 (minimum, SWD mode) added 2 NOP in PIN_DELAY_SLOW(...):
__ASM volatile (
".syntax unified\n"
"0:\n\t"
"subs %0,%0,#1\n\t"
"bne 0b\n\t"
"nop\n\t"
"nop\n"
: "+l" (delay) : : "cc"
);
Thanks.
In fuction PIN_DELAY_SLOW:
the loop is getting optimized out with -O3 option.
Proposed solution: Make the
count
volatile
or force the compiler to load the parameter into the register (which is my preferred option):which generates very efficient code: