dirtyjtag / DirtyJTAG

JTAG probe firmware
MIT License
464 stars 70 forks source link

move away from interrupts to speed up the code #76

Closed phdussud closed 3 years ago

phdussud commented 3 years ago

Hello! I believe moving away from interrupts and polling for the timer to overflow allows for faster clock rates. I measured the actual clock rate with the current FW. It does not go above 250Khz. The reason is that we need 2 interrupts per clock cycles and the overhead of taking an interrupt (pushing regs on the stack, restoring them at the exit) takes away from the useful code. Since the code polls a variable anyway, polling for the counter status register is not an architectural change. The transfer loop can now be made simpler because it isn't made up of 2 calls. Less state needs to be maintained. The resulting code can go up to around 1300Khz. I measure an actual programming of a FPGA with OpenFPGALoader to be close to 4x faster.
I tried to minimize the code changes so the structure stays the same. Please let me know what you think. Happy new year, Patrick

jeanthom commented 3 years ago

Hello Patrick, thank you very much for your contribution!

That's a stunning PR you've got here! For a while I thought I wouldn't be able to pass the 250 kHz mark without using assembly, and this PR proves me I was wrong.

Overall the code looks good to me. I'd like to do a little bit of testing to ensure that there is no regression with UrJTAG before merging.

Thanks again!

phdussud commented 3 years ago

I forgot to mention that I moved the TDO sensing and collection after the TCLK positive edge. This is for 2 reasons: The spec says it needs to be sampled right before the falling edge of TCK to get the largest timing window. Second, it tries to even out the duty cycle of TCLK. The other non-regular part of the code is setting the period of the counter to 1 for max speed. This will ensure that none of the 2 polling loops will ever wait. The downside is that the duty cycle of TCLK could be worse, but I figured that if there is a problem at that speed (1500Khz) then people can dial back a little and get back a normal value of the counter period. One last thing is that I believe the biggest improvement that can be made after this is increasing the packet length from 32 to 64. This requires changing the cmd structure (2 bytes for the length), therefore clients will have to change. This is the reason I didn't do it. Thanks, Patrick


From: Jean THOMAS notifications@github.com Sent: Sunday, January 3, 2021 1:08 PM To: jeanthom/DirtyJTAG DirtyJTAG@noreply.github.com Cc: phdussud phdussud@hotmail.com; Author author@noreply.github.com Subject: Re: [jeanthom/DirtyJTAG] move away from interrupts to speed up the code (#76)

Hello Patrick, thank you very much for your contribution!

That's a stunning PR you've got here! For a while I thought I wouldn't be able to pass the 250 kHz mark without using assembly, and this PR proves me I was wrong.

Overall the code looks good to me. I'd like to do a little bit of testing to ensure that there is no regression with UrJTAG before merging.

Thanks again!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjeanthom%2FDirtyJTAG%2Fpull%2F76%23issuecomment-753676462&data=04%7C01%7C%7C1e2c3dd45f5b451a195608d8b02bc896%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637453049374063536%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NaIGdKiEHOGQoGs1GDakW269rdJGEcc6aJe%2FqSlJ1tM%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVQKX6CQRE3CXAE4PAJZZDSYDMGRANCNFSM4VSDWZAA&data=04%7C01%7C%7C1e2c3dd45f5b451a195608d8b02bc896%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637453049374063536%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BZqQyMcm8jmaPn%2B20n3h241%2FSoTcpAHHUC9XN2fYejY%3D&reserved=0.

phdussud commented 3 years ago

If you want to take a look this: https://github.com/phdussud/DirtyJTAG/tree/spihttps://github.com/phdussud/DirtyJTAG/tree/spi It uses the SPI1 unit to mange the TCK, TDI, TDO during transfer at the highest speed. The clock rate is 4.5Mhz and there are no wasted clock cycles. It is naturally compatible with the stlink-v2-white and would require a change in the pinout of the bluepill. The other boards are not compatible because the SPI cannot use the pins defined for TCK, TDI, TDO. I added a fast loop for these boards , omitting the timer synchronization entirely. The clock rate is around 2Mhz...

phdussud commented 3 years ago

Another simple substantial improvement: we can introduce a transfer that does not send back a USB packet. Most of the programming involves only a write operation to the device. I implemented it and my timing improves quite a bit. With the best clock rate (SPI) it takes 253us for each 30byte of transfer. Only 53us is actually spend on the Jtag bus. The rest is USB overhead. If I implement the transfer with no read, the time goes down to 172us. In my example, the fastest wall clock time for the end to end programming is 2.9sec with normal transfers and 2s with no read transfer This is with OpenFPGALoader which already knows when it does not need to read back. Unfortunately we would have to rev the interface to implement this and make the clients add support to the new interface to take advantage of the new interface Thoughts? Thanks, Patrick

jeanthom commented 3 years ago

Tested with an xc9536xl on UrJTAG with a white ST-Link programmer. No issue detected. Thanks again for this contribution!