MCUdude / MicroCore

A light-weight Arduino hardware package for ATtiny13
551 stars 88 forks source link

Convert pulseIn function to (inline) assembly #30

Open MCUdude opened 7 years ago

MCUdude commented 7 years ago

So I've been working with optimizing the functions used in MicroCore. One of the goals was to redo the pulseIn function, as it relies on micros to work. I think I've created a good replacement for the pulseIn function, but LTO has to be enabled in order for the timing to be right. LTO also reduces the size of the function dramatically.

This core has the option to disable LTO (which is enabled by default). I'd like to take the output from the compiler (with LTO enabled) and make an inline assembly function out of it, so that the timing can be guarateed no matter what compiler flag is used. We're also working multiple clock frequencies, so this also have to be taken into account.

I have very little (no) experience with assembly, so any help would be appreciated. @nerdralph is it possible to use the output below to re-create the pulseIn function in assembly that isn't hard coded to a particular F_CPU?

int main(void)
{
  pulseInNew(1, HIGH, UINT32_MAX);
}

uint32_t pulseInNew(uint8_t pin, uint8_t state, uint32_t timeout) 
{  
  uint32_t width = 0; // Keep initialization out of time critical area

  // Convert the timeout from microseconds to a number of times through
  // the initial loop; it takes 16 clock cycles per iteration.
  uint32_t numloops = 0;
  uint32_t maxloops = microsecondsToClockCycles(timeout) >> 4; // Divide by 16

  // Wait for any previous pulse to end
  while (!!(PINB & _BV(pin)) == state)
  {
    if(numloops++ == maxloops) {return 0;}
    asm("nop \n");  
    asm("nop \n");
  }
  // Wait for the pulse to start
  while (!!(PINB & _BV(pin)) != state)
  {
    if(numloops++ == maxloops) {return 0;}
    asm("nop \n");  
    asm("nop \n");
    asm("nop \n");
  }
  // Wait for the pulse to stop This loop is 16 instructions long
  while (!!(PINB & _BV(pin)) == state)
  {
    if(numloops++ == maxloops) {return 0;}           
    width++;
    asm("nop \n");  
    asm("nop \n");  
    asm("nop \n"); 
  }

  // Convert the reading to microseconds. 
  return clockCyclesToMicroseconds(width << 4); // Same as multiplying by 16
}
$ avr-objdump -S pulseIn_t13_test.ino.elf

pulseIn_t13_test.ino.elf:     file format elf32-avr

Disassembly of section .text:

00000000 <__vectors>:
   0:   09 c0           rjmp    .+18        ; 0x14 <__ctors_end>
   2:   0e c0           rjmp    .+28        ; 0x20 <__bad_interrupt>
   4:   0d c0           rjmp    .+26        ; 0x20 <__bad_interrupt>
   6:   0c c0           rjmp    .+24        ; 0x20 <__bad_interrupt>
   8:   0b c0           rjmp    .+22        ; 0x20 <__bad_interrupt>
   a:   0a c0           rjmp    .+20        ; 0x20 <__bad_interrupt>
   c:   09 c0           rjmp    .+18        ; 0x20 <__bad_interrupt>
   e:   08 c0           rjmp    .+16        ; 0x20 <__bad_interrupt>
  10:   07 c0           rjmp    .+14        ; 0x20 <__bad_interrupt>
  12:   06 c0           rjmp    .+12        ; 0x20 <__bad_interrupt>

00000014 <__ctors_end>:
  14:   11 24           eor r1, r1
  16:   1f be           out 0x3f, r1    ; 63
  18:   cf e9           ldi r28, 0x9F   ; 159
  1a:   cd bf           out 0x3d, r28   ; 61
  1c:   02 d0           rcall   .+4         ; 0x22 <main>
  1e:   35 c0           rjmp    .+106       ; 0x8a <_exit>

00000020 <__bad_interrupt>:
  20:   ef cf           rjmp    .-34        ; 0x0 <__vectors>

00000022 <main>:
  22:   80 e0           ldi r24, 0x00   ; 0
  24:   90 e0           ldi r25, 0x00   ; 0
  26:   dc 01           movw    r26, r24
  28:   b1 9b           sbis    0x16, 1 ; 22
  2a:   1a c0           rjmp    .+52        ; 0x60 <__SREG__+0x21>
  2c:   01 96           adiw    r24, 0x01   ; 1
  2e:   a1 1d           adc r26, r1
  30:   b1 1d           adc r27, r1
  32:   83 39           cpi r24, 0x93   ; 147
  34:   28 e1           ldi r18, 0x18   ; 24
  36:   92 07           cpc r25, r18
  38:   24 e0           ldi r18, 0x04   ; 4
  3a:   a2 07           cpc r26, r18
  3c:   b1 05           cpc r27, r1
  3e:   11 f1           breq    .+68        ; 0x84 <__SREG__+0x45>
  40:   00 00           nop
  42:   00 00           nop
  44:   f1 cf           rjmp    .-30        ; 0x28 <main+0x6>
  46:   01 96           adiw    r24, 0x01   ; 1
  48:   a1 1d           adc r26, r1
  4a:   b1 1d           adc r27, r1
  4c:   83 39           cpi r24, 0x93   ; 147
  4e:   28 e1           ldi r18, 0x18   ; 24
  50:   92 07           cpc r25, r18
  52:   24 e0           ldi r18, 0x04   ; 4
  54:   a2 07           cpc r26, r18
  56:   b1 05           cpc r27, r1
  58:   a9 f0           breq    .+42        ; 0x84 <__SREG__+0x45>
  5a:   00 00           nop
  5c:   00 00           nop
  5e:   00 00           nop
  60:   b1 9b           sbis    0x16, 1 ; 22
  62:   f1 cf           rjmp    .-30        ; 0x46 <__SREG__+0x7>
  64:   0d c0           rjmp    .+26        ; 0x80 <__SREG__+0x41>
  66:   01 96           adiw    r24, 0x01   ; 1
  68:   a1 1d           adc r26, r1
  6a:   b1 1d           adc r27, r1
  6c:   83 39           cpi r24, 0x93   ; 147
  6e:   28 e1           ldi r18, 0x18   ; 24
  70:   92 07           cpc r25, r18
  72:   24 e0           ldi r18, 0x04   ; 4
  74:   a2 07           cpc r26, r18
  76:   b1 05           cpc r27, r1
  78:   29 f0           breq    .+10        ; 0x84 <__SREG__+0x45>
  7a:   00 00           nop
  7c:   00 00           nop
  7e:   00 00           nop
  80:   b1 99           sbic    0x16, 1 ; 22
  82:   f1 cf           rjmp    .-30        ; 0x66 <__SREG__+0x27>
  84:   80 e0           ldi r24, 0x00   ; 0
  86:   90 e0           ldi r25, 0x00   ; 0
  88:   08 95           ret

0000008a <_exit>:
  8a:   f8 94           cli

0000008c <__stop_program>:
  8c:   ff cf           rjmp    .-2         ; 0x8c <__stop_program>
nerdralph commented 6 years ago

I never used pulseIn() before, so when you first posted this and tagged me I didn't pay any attention to it. Now that I've taken a closer look, I see it is simply a software pin-change timer. Here's a back-of-the-napkin implementation in 7 asm instructions:

  clr count_lo
  clr count_hi 
wait:
  sbis PORTB, pin
  brne wait
pulseIn: ; 5 cycles/loop
  adiw count_lo, 1
  sbic PORTB, pin
  brne pulseIn

It counts multiples of 5 cycles, and waits for up to 65536 * 5 = 327680 cycles.

MCUdude commented 6 years ago

I'd be interested in a function that acts just like the original one. It should start by waiting for the previous event to stop, then wait for the next even to start, and then count the number of loops/cycles. The timeout should also be "long enough", at least a couple of minutes. Is this doable you think?

nerdralph commented 6 years ago

I can see the benefit of having the function return uS to maintain API compatibility with the Arduino core. However what is the necessity of supporting minute-long pulses? From a quick search I did, it seems the function is commonly used to measure echo times from ultrasonic distance sensors, which have pulse times of ms, not minutes.

MCUdude commented 6 years ago

However what is the necessity of supporting minute-long pulses?

I want to keep MicroCore as close as possible to the original Arduino functions as possible. I've seen usecases where an Arduino has to wait minutes for a short pulse to arrive. I'd prefer a pulseIn function that was able to measure minute long pulses and timeouts.

MCUdude commented 4 years ago

Bringing up a "dead" thread...

Issue #78 is caused by the pulseIn() function being very inaccurate. I tried to tune it, but I can't get something that's fairly accurate for all clocks. I used a signal generator connected to PB3 to "simulate" the internal oscillator. You can easily "trick" the IDE to use a 9.6 MHz external clock by burning bootloader for 16 MHz external clock, but compiling and uploading for the 9.6 MHz internal.

As for @nerdralph's comment, a new implementation doesn't have to be 100% compatible with the "official" one. What's important is that the maximum timeout is fairly long (a few seconds or more). It's also important that it returns microseconds (if possible).

It's basically this code that should be rewritten:

  // Wait for the pulse to stop
  while (!!(PINB & _BV(pin)) == state)
  {
    if(numloops++ == maxloops) {return 0;}           
    width++;
    asm("nop \n");  
    asm("nop \n");  
    asm("nop \n"); 
  }

  // Convert the reading to microseconds. 
  return clockCyclesToMicroseconds(width << 4); // Same as multiplying by 16