andrewpmiller / u8glib

Automatically exported from code.google.com/p/u8glib
Other
0 stars 0 forks source link

Software SPI bug with high PORTs (where SBI/CBI is not usable) #175

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Your current implementation of software SPI (e.g. in ST7920) is having a 
problem:

In case the port used is not in SBI/CBI range (<0x5C, e.g. Port H, L, K on 
mega2560) the compiler can not generate sbi/cbi single instructions for 1 bit 
manipulations
instead it will create LD(port), set/mask with variable, ST(port)
in case there is an interrupt during this instructions which manipulates 
(other) pins of same port the final ST will "undo" this manipulations and 
output the old value ==> corrupt the port output

==> a cli(); is needed to disable interrupts during this operation

Here an example of a "safe" bit write pulled from SDFAT library (DigitalPin.h)

static inline __attribute__((always_inline))
void fastBitWriteSafe(volatile uint8_t* address, uint8_t bit, bool level) {
  uint8_t oldSREG;
  if (address > (uint8_t*)0X5F) {
    oldSREG = SREG;
    cli();
  }
  if (level) {
    *address |= 1 << bit;
  } else {
    *address &= ~(1 << bit);
  }
  if (address > (uint8_t*)0X5F) {
    SREG = oldSREG;
  }
}

==> In order not to block interrupts for big amount of times you should use the 
cli(); for one bit or max 8 bits only.

Original issue reported on code.google.com by m...@stohn.de on 30 May 2013 at 9:37

GoogleCodeExporter commented 9 years ago
After looking into it a bit more it looks like you always have to use the 
cli(); since you use a variable to set/clear bits. The compiler can not 
optimize this to single sbi/cbi instructions even for lower registers in 
sbi/cbi range. 

Original comment by m...@stohn.de on 30 May 2013 at 10:36

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Thanks for reporting this.
I assume, that i need to update u8g_com_io.c:

Current implementation:
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
  volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);

  if ( level == 0 )
    *tmp &= ~_BV(internal_pin_number&7);
  else
    *tmp |= _BV(internal_pin_number&7);
}

This should be:
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
  volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
  uint8_t oldSREG;
  cli();  
  if ( level == 0 )
    *tmp &= ~_BV(internal_pin_number&7);
  else
    *tmp |= _BV(internal_pin_number&7);
  SREG = oldSREG;
}

correct?

Original comment by olikr...@gmail.com on 31 May 2013 at 5:19

GoogleCodeExporter commented 9 years ago
Oh oh...

this seems to get a bigger problem than I thought.

I was getting the issues inside of "u8g_com_arduino_st7920_spi.c" :
...
static void u8g_com_arduino_do_shift_out_msb_first(uint8_t val)
{
...
  do
  {
    if ( val & 128 )
      *outData |= bitData;
    else
      *outData &= bitNotData;
...

But the location you mention is also affected (e.g. when setting the chip 
select line).

The problem itself is not only related to AVR. 
Whenever you set/reset a pin with an operation like:
PORTx |= b; or PORTx &= m;
the compile might expand this to several instructions like:
reg = PORTx;
reg |= b;
PORTx = reg;

If this is the case then an interrupt after the reg= operation might change 
port bits and the write back later with PORTx = reg; corrupts the output.
On most platforms a good compiler like gcc will optimize this to a single bit 
set / bit clear operation. Unfortunately AVR megas are having more IO than the 
instruction set was designed for so it can not use the bit set/clear 
instructions on several (high) PORTs. Also the instruction set does not support 
setting/clearing multiple bits at the same time or variable bit location. So 
the compiler must know the bit position at compile time (which is not possible 
with your variable implementation).

==> your proposed SetPinLevel for AVR is a good.
 (small mistake in your code): uint8_t oldSREG;   should read: uint8_t oldSREG = SREG;

Unfortunately this will add a lot of latency / interrupt blocking to the 
output. My first idea would be to add a #define to support "interrupt save" 
operation. All the people using your great library would suffer slower I/O by 
the changes regardless if they use interrupts or not.

Also a good hardware design can prevent the need for using the cli(); stuff. 
One could use a dedicated port for all display lines and rest of port for some 
inputs only.

---
My main goal was to speed up the software SPI in order to use a 7920 display 
(HW-SPI is not available on the hardware I want to use it).
But now I see a slow down is required (cli() stuff) before a speedup can be 
thought of.

I will experiment a bit with the SW-SPI implementation to find a good trade-off 
for speedup / interrupt blocking.
Just a question at this time: Why do you set the shift out function 
"U8G_NOINLINE". In my eyes this would be a perfect candidate for inlining.
Speed is much more important than saving code size for this function.

Original comment by m...@stohn.de on 31 May 2013 at 7:49

GoogleCodeExporter commented 9 years ago
ah, correct:
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
  volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
  uint8_t oldSREG = SREG;
  cli();  
  if ( level == 0 )
    *tmp &= ~_BV(internal_pin_number&7);
  else
    *tmp |= _BV(internal_pin_number&7);
  SREG = oldSREG;
}

regarding this:
=== Quote Start ===
I was getting the issues inside of "u8g_com_arduino_st7920_spi.c" :
...
static void u8g_com_arduino_do_shift_out_msb_first(uint8_t val)
{
...
  do
  {
    if ( val & 128 )
      *outData |= bitData;
    else
      *outData &= bitNotData;
...
=== Quote End ===

I agree that there are more locations to prevent interrupts for some time.
My suggestion would be to introduce two macros
U8G_SAVE_IRQ_STATE_AND_DISABLE_IRQ()
U8G_RESTORE_IRQ_STATE()

These macros can be empty if interrupt safe code is not required.
Otherwise, these macros can be used whereever code requires such protection.
I agree that this is a larger issue. Funny, that i never realized this problem.

Original comment by olikr...@gmail.com on 31 May 2013 at 8:43

GoogleCodeExporter commented 9 years ago
Note: ATOMIC BLOCKS are probably not suitable, mainly because of not beeing 
portable enough (AVR specific):
http://www.nongnu.org/avr-libc/user-manual/group__util__atomic.html

Original comment by olikr...@gmail.com on 31 May 2013 at 8:49

GoogleCodeExporter commented 9 years ago
Note: stdatomic.h is not yet available on Arduino.
_ATOMIC_MODIFY_(__a, __o, __m, __x) can not be used.
From that point of view the following macros might be more portable for the 
future:
U8G_ATOMIC_OR(ptr,val) 
and
U8G_ATOMIC_AND(ptr,val) 

U8G_ATOMIC_OR(ptr,val) 
could be as simple as
U8G_ATOMIC_OR(ptr,val)  (*(ptr) |= (val))
but should be
U8G_ATOMIC_OR(ptr,val)  (global_SREG = SREG, cli(), (*(ptr) |= (val)), SREG = 
global_SREG)

Drawback: access time to global_SREG, not sure if cli() can be called within 
the comma expression

U8G_ATOMIC_OR(ptr,val)  do { uint8_t tmpSREG = SREG; cli(); (*(ptr) |= (val)); 
SREG = tmpSREG } while(0)

this need to be checked for optimization...

Original comment by olikr...@gmail.com on 31 May 2013 at 9:04

GoogleCodeExporter commented 9 years ago
current assembler sequence:
     94a:   90 81           ld  r25, Z

  if ( level == 0 )
    *tmp &= ~_BV(internal_pin_number&7);
     94c:   21 e0           ldi r18, 0x01   ; 1
     94e:   30 e0           ldi r19, 0x00   ; 0
     950:   0c 2e           mov r0, r28
     952:   01 c0           rjmp    .+2         ; 0x956 <u8g_SetPinLevel+0x26>
     954:   22 0f           add r18, r18
     956:   0a 94           dec r0
     958:   ea f7           brpl    .-6         ; 0x954 <u8g_SetPinLevel+0x24>
     95a:   d1 11           cpse    r29, r1

void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
  volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);

  if ( level == 0 )
     95c:   03 c0           rjmp    .+6         ; 0x964 <u8g_SetPinLevel+0x34>
     95e:   20 95           com r18
    *tmp &= ~_BV(internal_pin_number&7);
     960:   92 23           and r25, r18
     962:   01 c0           rjmp    .+2         ; 0x966 <u8g_SetPinLevel+0x36>
     964:   92 2b           or  r25, r18
  else
    *tmp |= _BV(internal_pin_number&7);
     966:   90 83           st  Z, r25

Original comment by olikr...@gmail.com on 31 May 2013 at 9:34

GoogleCodeExporter commented 9 years ago
sequence with
#define U8G_ATOMIC_OR(ptr, val) do { uint8_t tmpSREG = SREG; cli(); (*(ptr) |= 
(val)); SREG = tmpSREG; } while(0)
#define U8G_ATOMIC_AND(ptr, val) do { uint8_t tmpSREG = SREG; cli(); (*(ptr) &= 
(val)); SREG = tmpSREG; } while(0)

only minimal overhead is added and a register is used or tmpSREG:

     94a:   8f b7           in  r24, 0x3f   ; 63

  if ( level == 0 )
  {
    U8G_ATOMIC_AND(tmp, ~_BV(internal_pin_number&7));
     94c:   f8 94           cli
     94e:   90 81           ld  r25, Z
     950:   21 e0           ldi r18, 0x01   ; 1
     952:   30 e0           ldi r19, 0x00   ; 0
     954:   0c 2e           mov r0, r28
     956:   01 c0           rjmp    .+2         ; 0x95a <u8g_SetPinLevel+0x2a>
     958:   22 0f           add r18, r18
     95a:   0a 94           dec r0
     95c:   ea f7           brpl    .-6         ; 0x958 <u8g_SetPinLevel+0x28>
     95e:   d1 11           cpse    r29, r1

void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
  volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);

  if ( level == 0 )
     960:   03 c0           rjmp    .+6         ; 0x968 <u8g_SetPinLevel+0x38>
     962:   20 95           com r18
  {
    U8G_ATOMIC_AND(tmp, ~_BV(internal_pin_number&7));
     964:   92 23           and r25, r18
     966:   01 c0           rjmp    .+2         ; 0x96a <u8g_SetPinLevel+0x3a>
     968:   92 2b           or  r25, r18
   // *tmp &= ~_BV(internal_pin_number&7);
  }
  else
  {
    U8G_ATOMIC_OR(tmp, _BV(internal_pin_number&7));
     96a:   90 83           st  Z, r25
     96c:   8f bf           out 0x3f, r24   ; 63

Original comment by olikr...@gmail.com on 31 May 2013 at 9:39

GoogleCodeExporter commented 9 years ago
added the following macros to u8g.h
#    define U8G_ATOMIC_START()      do { global_SREG_backup = SREG; cli(); } 
while(0)
#    define U8G_ATOMIC_END()            SREG = global_SREG_backup
#    define U8G_ATOMIC_OR(ptr, val)     do { uint8_t tmpSREG = SREG; cli(); 
(*(ptr) |= (val)); SREG = tmpSREG; } while(0)
#    define U8G_ATOMIC_AND(ptr, val)    do { uint8_t tmpSREG = SREG; cli(); 
(*(ptr) &= (val)); SREG = tmpSREG; } while(0)

U8G_ATOMIC_START / END --> global memory access, bigger code
U8G_ATOMIC_OR / END --> register variable, smaller code

u8g_com_io.c has been updated
other locations need to be identified and updated

Original comment by olikr...@gmail.com on 31 May 2013 at 9:56

GoogleCodeExporter commented 9 years ago
In order to come up with a good optimization for SW-SPI it would be nice to 
have something like the proposed

U8G_SAVE_IRQ_STATE_AND_DISABLE_IRQ()
U8G_RESTORE_IRQ_STATE()

since reading/writing the SREG for every single bit looks like overkill. 

I think operation like: block IRQ, do a burst of 8 bit, re-enable IRQ could be 
the way to go.

Here is my first idea which should optimize to very good on AVR (reading of 
PORT is minimized, output values will be inside of registers, loop can be 
unrolled, 

IRQ_OFF();
if( PORTdat == PORTclk )
{
  tmp = PORTdat; //same as PORTclk;
  tmp = tmp & CLK_MASK & DAT_MASK; //clear CLK and DAT in tmp

  for( i=0; i<7; i++ )
  {
    tmpout = tmp | (databyte & 0x80) ? DAT_BIT : 0;
    PORTdat = tmpout; //set dat and clear clk at same time (this optimization works only in case the value is taken on rising edge like 7920)
    databyte <<= 1;
    PORTclk = tmpout | CLK_BIT; //set clk
  }
}
else
{
  tmpdaton = PORTdat | DAT_BIT ;
  tmpdatoff = PORTdat & DAT_MASK ;
  tmpclkon = PORTclk | CLK_BIT;
  tmpclkoff = PORTclk & CLK_MASK;

  for( i=0; i<7; i++ )
  {
    PORTdat = (databyte & 0x80) ? tmpdaton : tmpdatoff; 
    PORTclk = tmpclkoff; //clear clk
    databyte <<= 1;
    PORTclk = tmpclkon; //set clk
  }
}
IRQ_ON();

What do you think about this ?

Original comment by m...@stohn.de on 31 May 2013 at 9:57

GoogleCodeExporter commented 9 years ago
> What do you think about this ?
I guess, my update on this issue was done in parallel
Yes, indeed, this is why i think u8glib requires two pairs of macros.
U8G_ATOMIC_START / END --> for output of a complete byte, etc..
U8G_ATOMIC_OR / AND --> single, isolated bit change

Original comment by olikr...@gmail.com on 31 May 2013 at 10:01

GoogleCodeExporter commented 9 years ago
>I guess, my update on this issue was done in parallel

Yes... you are just to fast ;-)

Original comment by m...@stohn.de on 31 May 2013 at 10:10

GoogleCodeExporter commented 9 years ago
all files have been updated so far...
    U8G_ATOMIC_OR(ptr, val)
    U8G_ATOMIC_AND(ptr, val)
    U8G_ATOMIC_START();
    U8G_ATOMIC_END();

Original comment by olikr...@gmail.com on 1 Jun 2013 at 1:16

GoogleCodeExporter commented 9 years ago
Thanks for this fast update. I think issue #19 was solved by this update and 
can be closed now.

Also this issue looks solved now.

Thanks.

Original comment by m...@stohn.de on 2 Jun 2013 at 11:28

GoogleCodeExporter commented 9 years ago
tests so far are ok

Original comment by olikr...@gmail.com on 4 Jun 2013 at 7:58

GoogleCodeExporter commented 9 years ago

Original comment by olikr...@gmail.com on 4 Jun 2013 at 7:59