ataradov / free-dap

Free and open implementation of the CMSIS-DAP debugger firmware
BSD 3-Clause "New" or "Revised" License
261 stars 62 forks source link

Expected flashing speed #7

Closed ooxi closed 6 years ago

ooxi commented 6 years ago

I'm using free-dap on a SAMD11 in order to flash microcontrollers and it works remarkably well except for the speed. The process of programming and verifying a 128K ROM takes about 20 seconds.

I'm wondering what's the limit here:

Before investing in any solution I wanted to make sure I'm not missing something obvious here. Is 20s a reasonable speed for 128K program + verify or should it be much faster anyhow?

ataradov commented 6 years ago

128K device contains 512 rows and 2048 pages. Programming time for the device variant A is 512 6 ms + 2048 2.5 ms = 8192 ms = 8.2 seconds. Variants B and C are a bit faster at erase and their programming time is 512 1.2 ms + 2048 2.5 ms = 5.7 seconds.

The rest s mostly defined by the USB HID. On USB FS, HID can only do 64 KBytes per second of sustained bandwidth, but in this case we send small frames (polling for flags) a lot. And it can only send one 64 byte frame in 1 ms interval. This is a significant source of the delay.

Interface bit-banging is pretty fast. D11 already runs at 48 MHz and bit-banging happens as fast as possible without going into asm and rewriting the whole routines. But that won't make things much faster anyway. You may shave 1-2 seconds, no more.

ataradov commented 6 years ago

Estimated number of USB HID frames exchanged (command + response) per 256-byte sector: Write address = 2 Write unlock command = 2 Read INTFLAG = 2 Write erase row command = 2 Read INTFLAG = 2 Write block = 10

Total = 20

Time to execute the command is 1 ms, so it will take 20 ms per row or 20 * 512 = 10240 ms = 10.2 seconds. This is just to transfer the data and comands.

ataradov commented 6 years ago

USB HS can be much faster, and that's why recent $15 Microchip programmer is based on SAME70, even though there is absolutely no need for such a powerful device in a programmer. USB is also only available only in 100-pin variant, so you end up paying quite a bit.

ooxi commented 6 years ago

Thanks for the extensive feedback! Could you elaborate on the speed differences between device variant A and B/C? Are you talking about Chip revisions of the target to be programmed?

When transferring the data to be programmed via USB HID takes about 10s and has to be transferred twice (once for programming, once for verification) this more or less explains the time I'm seeing. USB mass storage looks like a promising alternative, however it needs enough ROM/RAM on the chip to be useful.

Could you point me to the $15 Microchip programmer you are talking about? Such a low cost device would be quite handy since the alternative are multi hundred dollar Segger devices.

ataradov commented 6 years ago

Revisions of the target chip. The numbers are taken from the SAM D21 datasheet. See tables 37-43 and 37-44.

MSD is a horrible alternative. It is easier to get $20 LPC-Link 2 and set it to CMSIS-DAP mode. This will maintain full compatibility with your tools, but you will be the best CMSIS-DAP programmer on the market.

The tool I'm talking about is MPLAB Snap - http://www.microchip.com/developmenttools/ProductDetails/PartNO/PG164100

But it uses a proprietary protocol on the USB side, so it won't work with anything but MPLAB.

ooxi commented 6 years ago

Thank you for your feedback!

ooxi commented 6 years ago

I have ordered LPC-Link 2 for evaluation, thanks for the tip! One more question: since ⅔ of the time for programming and verification is spend on USB HID overhead and not on actual programming, would using SPI instead of USB HID result in a significant performance improvement?

I know that this would mean I can only use edbg and free-dap and would need a host like a Raspberry Pi, but that's my setup anyhow. However I'm not familiar with the SPI latency in comparison to the USB HID latency, so maybe you could give me a clue :-)

ataradov commented 6 years ago

If you have a master CPU capable of running any code and a bit of bit-banging, then this is the better solution - https://github.com/ataradov/embedded-swd . No need to have the intermediate layers at all.