dbuezas / lgt8fx

Board Package for Logic Green LGT8F328P LGT8F328D and LGT8F88D
367 stars 88 forks source link

delayMicroseconds() #18

Closed LaZsolt closed 3 years ago

LaZsolt commented 4 years ago

In wiring.c delayMicroseconds() function using SBIW and BNRE instructions for timing. In AVR MCUs it takes usually 4 clock cycles while in LGT MCUs takes usually 3 clock cycles. Therefore this function delay less than expected. I found a better version in a chinese site. The author corrected only 16 and 32 MHz cases. https://www.geek-workshop.com/thread-38486-1-1.html

LaZsolt commented 4 years ago

I think the best solution for LGT MCUs is if we use a NOP between SBIW and BRNE.

LaZsolt commented 4 years ago

mybetter_delayMicroseconds.c.txt

LaZsolt commented 4 years ago

Edit: I found this code not precise because the compiler using only the whole part of F_CPU/3000000L. The corrected code is in a a later omment.

If you don't want to delay zero time and every timing are constans (not variable), this most compact and precise LGT8Fx specific code is for you. (Must be in Arduino.h) Link time optimization (LTO) does not matter.

#define delayMicroseconds(us)            \
(__extension__({                         \
  __asm__ __volatile__ (                 \
    "usL_%=:" "sbiw %0,1" "\n\t"         \
              "brne usL_%="              \
              :  /* no outputs */    \
              :"w" ((uint16_t) ((F_CPU/3000000L) * (uint16_t)us))  \
  );                                     \
}))
seisfeld commented 4 years ago

Would this work for any selected speed? I'm asking because of the F_CPU/3000000L. And what do you mean by "every timing are constans (not variable)"? Sorry if these are stupid questions.

LaZsolt commented 4 years ago

Yes I think. I did some compilation with different selected speeds and code disassembling. Timing values seems correct, but I can't measure. Tested on 16 MHz with DS16B20 temperature sensor.

LaZsolt commented 4 years ago

When I write timing are constans, I mean compiler calculate ( F_CPU/3000000L * microseconds) like this example at 16 MHz:

    delayMicroseconds(3);

Compiled to:
 2b8:   0f e0           ldi     r24, 0x0F
 2ba:   10 e0           ldi     r25, 0x00
000002bc <usL_217>:
 2bc:   01 97           sbiw    r24, 0x01
 2be:   f1 f7           brne    .-4        ; 0x2bc <usL_217>

If you call delayMicroseconds() macro with variables, MCU will do necessary calculations. It cause more delay than load two registers, so delay would be not precise. Don't use my macro like this:

  uint16_t nnn=12;
  for( int i = 0; i < 118; i++) {
    delayMicroseconds(nnn);        // nnn is a variable
    nnn = 3.14*nnn+1;
  }  
seisfeld commented 4 years ago

Allright, thank you for the explanation! Really appreciate it. :)

LaZsolt commented 4 years ago

Now this code really precise. If you don't want to delay zero time and every timing are constans (not variable), this most compact and precise LGT8Fx specific code is for you. (Must be in Arduino.h) Link time optimization (LTO) does not matter. (3000000L is the LGT8Fx specific value. The Atmel specific value is 4000000L)

#define delayMicroseconds(us)     \
(__extension__({                  \
  __asm__ __volatile__ (          \
    "usL_%=:" "sbiw %0,1" "\n\t"  \
              "brne usL_%="       \
              :  /* no outputs */ \
              :"w" ( (uint16_t) ((F_CPU * (uint32_t) us) / 3000000L) )  \
  );                              \
}))
dbuezas commented 4 years ago

Ill add the new delayMicroseconds version you posted first on the next version. Thanks for that!

dbuezas commented 4 years ago

I forgot about this one in the last release. I'll put it in in the next one

jayzakk commented 4 years ago

( if anyone read my comment I just deleted - it's just too hot and I did a mistake ^^ )

LaZsolt commented 4 years ago

If you want to try on 32 MHz, you must use my delayMicroseconds code. I wrote two different type of delayMicroseconds. Which one you choose?

jayzakk commented 4 years ago

@LaZsolt , i used the macro version, but accidently put it into wiring.c instead Arduino.h.

LaZsolt commented 4 years ago

When using macro version of delayMicroseconds() with higher values, the compiler multiplication overflow 32 bit number, so the timing will not correct. I have an idea how to solve this problem, but I need to sleep now.

LaZsolt commented 4 years ago

I made several compilations on new code. No 32 bit overflow within parameter limitations.

#define delayMicroseconds(us)     \
(__extension__({                  \
  __asm__ __volatile__ (          \
    "usL_%=:" "sbiw %0,1" "\n\t"  \
              "brne usL_%="       \
              :  /* no outputs */ \
              :"w" ( (uint16_t) (( (F_CPU/1000) * (uint32_t) us ) / 3000L) )  \
  );                              \
}))

Macro parameter limitations:

Freq min. delay max. delay max.+1 microsec when _macro parameter is:
32 MHz 1 6143 0
16 MHz 1 12287 0
8 MHz 1 24575 0
4 MHz 1 49151 0
2 MHz 2 98303 0 or 1
1 MHz 3 196607 0 or 1 or 2

Timing in microsec at 1, 2, 4 MHz, when parameter is:

- 1 2 3 4 5 6 7 8 9
4 MHz 1 1.75 3.25 4 4.75 6.25 7 7.75 9.25
2 MHz 98304 2 3.5 3.5 5 6.5 6.5 8 9.5
1 MHz 196608 196608 4 4 4 7 7 7 10
LaZsolt commented 4 years ago

@dbuezas, I found a library (Adafruit DHT temperature sensors) which using delayMicroseconds() with a variable parameter. This mean the macro version of delayMicroseconds() not fully compatibile. So don't put the macro version in next release. But I have a new idea with the old style function: void attribute ((noinline)) delayMicroseconds() { ... } I am still testing it.

dbuezas commented 4 years ago

with #51 merged, this can be closed, right?

LaZsolt commented 4 years ago

Not yet. I would like to comment here my ideas for a while.

XGIACOMO commented 4 years ago

ciao, well done for your job!! i'm using LGT8F with your libreries but i faced problems due to lack of precision with microseconds in bit banging. (this is the bus that i'd like to use with LGT8F https://www.pjon.org/ ) are you planning to fix the microsecond issues or are you no longer pursuing it? thank you very much Giacomo

LaZsolt commented 4 years ago

@XGIACOMO What type of precision problems have you faced?

Anyway, delayMicroseconds() is not modified in the release v1.0.6. The actual version of these branch of delayMicroseconds() will be in the next release.

If you want to use my better delayMicroseconds() now, you need to copy this two files

https://github.com/dbuezas/lgt8fx/blob/master/lgt8f/cores/lgt8f/Arduino.h https://github.com/dbuezas/lgt8fx/blob/master/lgt8f/cores/lgt8f/wiring.c

to your hard drive directory:

C:\Users\__yourusername__\AppData\Local\Arduino15\packages\LGT8fx Boards\hardware\avr\1.0.5\cores\lgt8f\

XGIACOMO commented 4 years ago

hy LaZsolt, thank you for your answer! this is what i've got with a simple delayMicroseconds(50) sketch values are so far from what they should be! i'll try your new libraries. thank you, i'll keep you in touch!

50us@16mhz 50us@32mhz

LaZsolt commented 4 years ago

Be aware digitalWrite() takes 2 to 5 microseconds. Any port bitSet(), bitClear() much more faster than digitalWrite() but need to calculate its execution time when calculating pulse time.

dbuezas commented 4 years ago

If you want to get as efficient, precise, and fast as possible at having something happening at regular intervals, there is a trick you can do using counters. It is the same as using interrupts but you just busy-wait for the counter to reach its target instead of consuming the 50+ cycles of the whole interrupt prelude, return & stuff.

So let's count clock cycles on run time:

// setting up timer 1
// secPerSample means "seconds per sample"
// this will work up to a max wait of 2ms (i.e secPerSample=0.002)
void startCPUCounter(float secPerSample) {
  TCCR1A = 0;
  TCCR1B = (1 << WGM13) | (1 << WGM12)   // CTC mode, counts to ICR1
        | (1 << CS10); // prescaler set to 1
  ICR1 = secPerSample * F_CPU - 1;

  TCNT1 = 0;
  setBit(TIFR1, OCF1A);  // clear overflow bit
}

__attribute__((always_inline)) inline void myDelay(){
   loop_until_bit_is_set(TIFR1, OCF1A); // this can be off by at most 3 clock cycles, but error won't accumulate because the timer will keep counting
  TIFR1 = 255;  // setBit(TIFR1, OCF1A); is actually enough, but clearing all flags at once is quicker and I'm not using the other timer flags anyway.
}

And then you busy-wait for a very precise timing that doesn't accumulate error:

void setup(){
    startCPUCounter(1.0/1000000); // 1us cycle

}
void loop(){
  noInterrupts(); // if you don't do this, it will be a bit off some times, but error won't accumulate.
  for (int i = 0; i< 10000;i++){
    doTheThingThatNeedsToHappenAtVeryPreciseAndShortIntervals();
    myDelay();
  }
  interrupts();
}

It is the trick I ended up using in the oscilloscope project to get the oscilloscope here to go very fast even while handling multiple channels and checking for triggering conditions. Only there I'm using Timer3 and fiddle with prescalers to get to higher waiting intervals when necessary. Obvious in hindsight I did feel really clever about this.

The good thing about all this nonsense, is that the counter will keep track of time while you are doing something else, so error never accumulates.

XGIACOMO commented 4 years ago

hy dbuezas, thank you for your answer!! i'm building in my caravan a battery charger with a 15v powersupply were i work on the feedback regulation to create the target charge stages. i find lgt8f very powerfull due to integrated dac and Differential Amplifier with 32x gain: with this chip i don't need other external components, and it works very well!! unfortunatly i need to comunicate with the central unit using a bitbang protocol, https://www.pjon.org/ the same that i'm using to connect all devices in my caravan, but i'm not able to make lgt8f compatible with the protocol becouse of different bits lenght. in coming days i'll try new LaZsolt libraries. i hope to succede! than you again

XGIACOMO commented 4 years ago

great job @LaZsolt!!!!!! much better than before ;-)

50microsecondsLgt8f@32mhz

dbuezas commented 4 years ago

It's about time we make a new release including all these improvements from @Lazsolt et al.

LaZsolt commented 4 years ago

@XGIACOMO I am also happy with the better result.

LaZsolt commented 4 years ago

@dbuezas I am still working on a clock tick precise verison of delayMicroseconds(), but most of the case not needed clock tick tight precisity. But on lower freq, like 1 MHz, could useful the clock tick precise timing. This task is more complex than I thought at first and code is became complex too. (I found posts about to correcting the delayMicroseconds(), older than 15 years.) I have ideas, half made codes, and a testing code, but none of them are good enough to publish yet.

dbuezas commented 4 years ago

@LaZsolt that's awesome! Should I wait then?

LaZsolt commented 4 years ago

@dbuezas only few days. I think need to discuss about delayMicroseconds() code, before we will reject it. ;)

SuperUserNameMan commented 4 years ago

@LaZsolt : Since you're still working on this, just throwing an idea that is bugging me since I discovered LGT8Fx boards : do you think it would be possible to make clock speed software defined so it would be possible to change the speed of the board at runtime and have delay(), delayMicroseconds(), millis(), micros(), Serial(), I2C, SPI timings etc etc to adapt themselves at runtime too ?

LaZsolt commented 4 years ago

@SuperUserNameMan

do you think it would be possible to make clock speed software defined ... ?

I may can create a source for the case of runtime variable clock speeds, but the delay count calculations became more complex, so short timing will be more inaccurate. Better idea is, when caller will calculate before calling delayMicroseconds() or duplicate the bit-banging routines for two (or more) different clock speeds. delay() is clock speed independent now, because it calling micros() which reads the timer. If timer set is correct, then delay() will measure the correct milliseconds.

LaZsolt commented 4 years ago

@dbuezas

#define delayMicroseconds(us)       \
    if (__builtin_constant_p(us)) { \
        delayMicroseconds_c(us);    \
    } else {                        \
        delayMicroseconds_v(us);    \
    }
#define delayMicroseconds_c(us) lgt8fx_delay_cycles((double)us*F_CPU/1000000)    // for constant case
 void   delayMicroseconds_v(unsigned int us) __attribute__ ((noinline));         // for variable case which is same as earlier

More sources in the coming days.

LaZsolt commented 4 years ago

Just finished, but not tested yet: https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F

LaZsolt commented 4 years ago

Watch my new readme. https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F

SuperUserNameMan commented 4 years ago

@LaZsolt : I've seen your code includes a trick to prevent the linker from discarding a function you want to make available for debugging.

I don't know if it it could be useful to you, but if you want to protect some of your functions from compiler optimization, you can encapsulate them this way :

#pragma GCC push_options
#pragma GCC optimize ("keep-static-functions") 
static void foo( int a )
{
  // code i want to protect from compiler optimization
}
#pragma GCC pop_options

You can also specify a level of optimization O0, O1, O2, O3, Ofast, or Os this way #pragma GCC optimize ("O0", "keep-static-functions").

More info here :

LaZsolt commented 4 years ago

Finished, tested.

LaZsolt commented 3 years ago

An interesting trick:

This code will wait x*4 - 1 clocks cycle as we know.

__asm__ __volatile__ (
    "1: sbiw %0,1   \n\t"          // 1 cycle in LGT
    "   nop         \n\t"          // 1 cycle
    "   brne 1b"                   // 2 cycles ( 1 cycle when counter became 0 )
    : "=w" (x)                     // No outputs, but it is inform the compiler about a modified register
    : "0"  (x)
);



But this code will wait x*4 clocks cycle exactly.

__asm__ __volatile__ (
    "1: sbiw %0,1   \n\t"          // 1 cycle in LGT
    "   breq .+0    \n\t"          // 1 cycle  ( 2 cycle when counter became 0 )
    "   brne 1b"                   // 2 cycles ( 1 cycle when counter became 0 )
    : "=w" (x)                     // No outputs, but it is inform the compiler about a modified register
    : "0"  (x)
);

This little trick, with same code size, can make a bit readabe delay calculations inside delayMicroseconds_v().

LaZsolt commented 3 years ago

@SuperUserNameMan

do you think it would be possible to make clock speed software defined so it would be possible to change the speed of the board at runtime ... ?

Yes, I found a solution for delayMicroseconds(). The compiler may can select the best code for different clock speeds. The source not uploaded yet.

Other thing is, the delay calculations and the delaying cycle all written in assembly language, so it will avoid compiler or linker optimizations.

SuperUserNameMan commented 3 years ago

@LaZsolt : sounds great !

LaZsolt commented 3 years ago

When I tested, the compiler handled the different speeds.

namespace CPU32 {
  #undef  F_CPU
  #define F_CPU 32000000
  #include "delayMicroseconds.h"
}
namespace CPU16 {
  #undef  F_CPU
  #define F_CPU 16000000
  #include "delayMicroseconds.h"
}
namespace CPU4 {
  #undef  F_CPU
  #define F_CPU 4000000
  #include "delayMicroseconds.h"
}


But after I put the final source into the core (without namespace) I getting redefinition errors in every namespaces.

SuperUserNameMan commented 3 years ago

Oh ! I've just noticed I misread your message in my first answer that i've just deleted.

When I tested, the compiler handled the different speeds. But after I put the final source into the core (without namespace) I getting redefinition errors in every namespaces.

So the namespace trick works if the delayMicroseconds() function is stored into a separate header. If that's jsut a matter of copy/pasting, I don't think it's a problem.

Your namespace trick is more elegant than what I proposed in the answer I deleted.

Thanks for the idea !

LaZsolt commented 3 years ago

A very new delayMicroseconds() are finished and tested again. This version has better accuracy than before. The lowest frequency for clock tick accurate delay is:

You may use it with different clock frequencies in one source. This example shows how to use it on the menu selected and the other frequencies together:

#define FCPUSAVE F_CPU
namespace CPU32 {
  #undef  F_CPU
  #define F_CPU 32000000
  #include "delayus.h"
}
namespace CPU1 {
  #undef  F_CPU
  #define F_CPU 1000000
  #include "delayus.h"
}
#undef  F_CPU
#define F_CPU FCPUSAVE

void setup() {
  // put your setup code here, to run once:
}

void loop() {
  delayMicroseconds(3);            // Default speed from Arduino IDE
  // set speed to 32 MHz
  CPU32 :: delayMicroseconds(3);   // 32 MHz speed
  // set speed to 1 MHz
  CPU1  :: delayMicroseconds(3);   // 1 MHz speed
  // set speed back to the basic speed
}

The source is here: https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F How to install:



The previous attept was failed because I wanted to define delayMicroseconds() by a macro, but the compiler found it too complex when I tried to compile in different namespaces. Perhaps I run into a compiler issue. The macro which was bad I mentioned earlier: https://github.com/dbuezas/lgt8fx/issues/18#issuecomment-721539223

DurandA commented 3 years ago

@LaZsolt sorry to hijack this issue. I am trying to port waiting functions from the AVR Transistortester (in wait.S) to the LGT8F328P. Do you know if these figures are still valid with the LGT8F328P?

SuperUserNameMan commented 3 years ago

@DurandA : according to chinese datasheet, rcall needs 1 cycle, and ret 2 cycles.

See page 252 : https://raw.githubusercontent.com/dbuezas/lgt8fx/master/docs/LGT8FX8P_databook_v1.0.4.ch.pdf

DurandA commented 3 years ago

@SuperUserNameMan Thank you very much. I missed it in the translated datasheet.