JChristensen / tinySPI

Arduino hardware SPI library for ATtiny44/84, 45/85, 461/861, 2313/4313.
GNU General Public License v3.0
154 stars 25 forks source link

performance of USI not as fast as it could be. #9

Closed jamesdanielv closed 5 years ago

jamesdanielv commented 5 years ago

hello, first of all thank you for all your hard work. i have done a separate branch with optimized transfers of the write/receive of the data. https://github.com/jamesdanielv/tinySPI/blob/master/src/tinySPI.cpp

i can only test sending of data currently so did not want to create a pull request on this branch.

the while loop does have performance penalty because of compare, and the jmp command has a cycle of 3 or 4 cycles for while loops. it just seemed more efficient to make the data transfer happen in an unrolled loop, since it does not increase size that much but reduces the clock edge transfers to 1 clock cycle, except for loading and unloading of byte data to send and receive.

there are some other areas that can improve performance still such as a buffer if more than 1 byte to send (just saves 1-2 cycles per byte transfer) or if data does not change, just resending same byte instead of loading it again, and the possibility of using a timer for sck. (and freeing up time between to do other things), although i have not heard that it works.

it is possibly i might not understand something, and performed a magic trick unexpectedly. feel free to verify the changes to my branch.

data sheet for the avr atTiny processors are here: http://ww1.microchip.com/downloads/en/devicedoc/atmel-2586-avr-8-bit-microcontroller-attiny25-attiny45-attiny85_datasheet.pdf

commands and cycle times for them are here: http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-0856-AVR-Instruction-Set-Manual.pdf USICR |= _BV(USITC); breaks down to a 'exclusive OR' 1 cycle operation. while is similar to BREQ 1-2cycles, and the bitwise instruction is not referenced dire the Arduino code does not simplify small loops to inline like some other IDE's.

JChristensen commented 5 years ago

Hello James,

Thanks for the input. IIRC there is an unrolled loop example in the datasheet. Bottom line, it's a design decision, whether to optimize the code for speed or size. I chose the latter figuring that it might be preferable given the modest resources of the ATtiny MCUs, and since even this achieves a significant speed advantage over software approaches, e.g. shiftOut().

I'm a bit surprised at this library's popularity; it has garnered a fair amount of stars. I'm not sure what people use it for. tinySPI is not high on my priority list right now and also so far I'm not aware of any performance issues that could benefit from an increased SPI clock rate. But I'll keep this suggestion in mind if any crop up.

Thanks again ... Jack

jamesdanielv commented 5 years ago

as for uses, SPI displays, and sensors, some of which need to send and receive a lot of data can benefit from the speedup, or some purposes on battery could run at a lot lower clock speed because of efficient transfers.

i would imagine that the portability and low power of atTiny are the reasons for its popularity.

as for the 24 bytes more of flash this change uses: you could also provide the user a choice of speed or size such as #define speedOverSize

if speedOverSize == true

USICR |= _BV(USITC);USICR |= _BV(USITC);USICR |= _BV(USITC); ;USICR |= _BV(USITC);
USICR |= _BV(USITC);USICR |= _BV(USITC);USICR |= _BV(USITC); ;USICR |= _BV(USITC);
USICR |= _BV(USITC);USICR |= _BV(USITC);USICR |= _BV(USITC); ;USICR |= _BV(USITC);
USICR |= _BV(USITC);USICR |= _BV(USITC);USICR |= _BV(USITC); ;USICR |= _BV(USITC);

else

while ( !(USISR & _BV(USIOIF)) ) USICR |= _BV(USITC);

endif

thanks again for creating this code more easily interpretive than the code included in the data sheets from atmel.

i've forked a version off of yours that will give the users a choice from your feedback of smaller size, and even save 10 more bytes if interrupts are not used that effect the USI registers.

JChristensen commented 5 years ago

I like the idea of using a #define to provide a choice like that. Just out of curiosity, did a need for additional speed prompt you to make these modifications?

jamesdanielv commented 5 years ago

yes, i'm testing out code one part of which is to run on atTiny for running an SPI st7735R graphics display with 128x128 resolution. I have already done the changes. the atomic code is not really needed as the clock and data are sync to the USI state. since i'm not 100%, my fork has 2 additional defines:

define speedOverSize true //if true faster performance however it uses 24 more bytes

define IdoNotUseInterrupts true //if you do no use interrupts specifically USI, set this to true to save 10 bytes!

fork is here: https://github.com/jamesdanielv/tinySPI/blob/master/src/tinySPI.cpp