Closed LaZsolt closed 3 years ago
I think the best solution for LGT MCUs is if we use a NOP between SBIW and BRNE.
Edit: I found this code not precise because the compiler using only the whole part of F_CPU/3000000L. The corrected code is in a a later omment.
If you don't want to delay zero time and every timing are constans (not variable), this most compact and precise LGT8Fx specific code is for you. (Must be in Arduino.h) Link time optimization (LTO) does not matter.
#define delayMicroseconds(us) \
(__extension__({ \
__asm__ __volatile__ ( \
"usL_%=:" "sbiw %0,1" "\n\t" \
"brne usL_%=" \
: /* no outputs */ \
:"w" ((uint16_t) ((F_CPU/3000000L) * (uint16_t)us)) \
); \
}))
Would this work for any selected speed? I'm asking because of the F_CPU/3000000L
. And what do you mean by "every timing are constans (not variable)"? Sorry if these are stupid questions.
Yes I think. I did some compilation with different selected speeds and code disassembling. Timing values seems correct, but I can't measure. Tested on 16 MHz with DS16B20 temperature sensor.
When I write timing are constans, I mean compiler calculate ( F_CPU/3000000L * microseconds) like this example at 16 MHz:
delayMicroseconds(3);
Compiled to:
2b8: 0f e0 ldi r24, 0x0F
2ba: 10 e0 ldi r25, 0x00
000002bc <usL_217>:
2bc: 01 97 sbiw r24, 0x01
2be: f1 f7 brne .-4 ; 0x2bc <usL_217>
If you call delayMicroseconds() macro with variables, MCU will do necessary calculations. It cause more delay than load two registers, so delay would be not precise. Don't use my macro like this:
uint16_t nnn=12;
for( int i = 0; i < 118; i++) {
delayMicroseconds(nnn); // nnn is a variable
nnn = 3.14*nnn+1;
}
Allright, thank you for the explanation! Really appreciate it. :)
Now this code really precise. If you don't want to delay zero time and every timing are constans (not variable), this most compact and precise LGT8Fx specific code is for you. (Must be in Arduino.h) Link time optimization (LTO) does not matter. (3000000L is the LGT8Fx specific value. The Atmel specific value is 4000000L)
#define delayMicroseconds(us) \
(__extension__({ \
__asm__ __volatile__ ( \
"usL_%=:" "sbiw %0,1" "\n\t" \
"brne usL_%=" \
: /* no outputs */ \
:"w" ( (uint16_t) ((F_CPU * (uint32_t) us) / 3000000L) ) \
); \
}))
Ill add the new delayMicroseconds version you posted first on the next version. Thanks for that!
I forgot about this one in the last release. I'll put it in in the next one
( if anyone read my comment I just deleted - it's just too hot and I did a mistake ^^ )
If you want to try on 32 MHz, you must use my delayMicroseconds code. I wrote two different type of delayMicroseconds. Which one you choose?
@LaZsolt , i used the macro version, but accidently put it into wiring.c instead Arduino.h.
When using macro version of delayMicroseconds() with higher values, the compiler multiplication overflow 32 bit number, so the timing will not correct. I have an idea how to solve this problem, but I need to sleep now.
I made several compilations on new code. No 32 bit overflow within parameter limitations.
#define delayMicroseconds(us) \
(__extension__({ \
__asm__ __volatile__ ( \
"usL_%=:" "sbiw %0,1" "\n\t" \
"brne usL_%=" \
: /* no outputs */ \
:"w" ( (uint16_t) (( (F_CPU/1000) * (uint32_t) us ) / 3000L) ) \
); \
}))
Macro parameter limitations:
Freq | min. delay | max. delay | max.+1 microsec when _macro parameter is: |
---|---|---|---|
32 MHz | 1 | 6143 | 0 |
16 MHz | 1 | 12287 | 0 |
8 MHz | 1 | 24575 | 0 |
4 MHz | 1 | 49151 | 0 |
2 MHz | 2 | 98303 | 0 or 1 |
1 MHz | 3 | 196607 | 0 or 1 or 2 |
Timing in microsec at 1, 2, 4 MHz, when parameter is:
- | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
4 MHz | 1 | 1.75 | 3.25 | 4 | 4.75 | 6.25 | 7 | 7.75 | 9.25 |
2 MHz | 98304 | 2 | 3.5 | 3.5 | 5 | 6.5 | 6.5 | 8 | 9.5 |
1 MHz | 196608 | 196608 | 4 | 4 | 4 | 7 | 7 | 7 | 10 |
@dbuezas, I found a library (Adafruit DHT temperature sensors) which using delayMicroseconds() with a variable parameter. This mean the macro version of delayMicroseconds() not fully compatibile. So don't put the macro version in next release. But I have a new idea with the old style function: void attribute ((noinline)) delayMicroseconds() { ... } I am still testing it.
with #51 merged, this can be closed, right?
Not yet. I would like to comment here my ideas for a while.
ciao, well done for your job!! i'm using LGT8F with your libreries but i faced problems due to lack of precision with microseconds in bit banging. (this is the bus that i'd like to use with LGT8F https://www.pjon.org/ ) are you planning to fix the microsecond issues or are you no longer pursuing it? thank you very much Giacomo
@XGIACOMO What type of precision problems have you faced?
Anyway, delayMicroseconds() is not modified in the release v1.0.6. The actual version of these branch of delayMicroseconds() will be in the next release.
If you want to use my better delayMicroseconds() now, you need to copy this two files
https://github.com/dbuezas/lgt8fx/blob/master/lgt8f/cores/lgt8f/Arduino.h https://github.com/dbuezas/lgt8fx/blob/master/lgt8f/cores/lgt8f/wiring.c
to your hard drive directory:
C:\Users\__yourusername__\AppData\Local\Arduino15\packages\LGT8fx Boards\hardware\avr\1.0.5\cores\lgt8f\
hy LaZsolt, thank you for your answer! this is what i've got with a simple delayMicroseconds(50) sketch values are so far from what they should be! i'll try your new libraries. thank you, i'll keep you in touch!
Be aware digitalWrite() takes 2 to 5 microseconds. Any port bitSet(), bitClear() much more faster than digitalWrite() but need to calculate its execution time when calculating pulse time.
If you want to get as efficient, precise, and fast as possible at having something happening at regular intervals, there is a trick you can do using counters. It is the same as using interrupts but you just busy-wait for the counter to reach its target instead of consuming the 50+ cycles of the whole interrupt prelude, return & stuff.
So let's count clock cycles on run time:
// setting up timer 1
// secPerSample means "seconds per sample"
// this will work up to a max wait of 2ms (i.e secPerSample=0.002)
void startCPUCounter(float secPerSample) {
TCCR1A = 0;
TCCR1B = (1 << WGM13) | (1 << WGM12) // CTC mode, counts to ICR1
| (1 << CS10); // prescaler set to 1
ICR1 = secPerSample * F_CPU - 1;
TCNT1 = 0;
setBit(TIFR1, OCF1A); // clear overflow bit
}
__attribute__((always_inline)) inline void myDelay(){
loop_until_bit_is_set(TIFR1, OCF1A); // this can be off by at most 3 clock cycles, but error won't accumulate because the timer will keep counting
TIFR1 = 255; // setBit(TIFR1, OCF1A); is actually enough, but clearing all flags at once is quicker and I'm not using the other timer flags anyway.
}
And then you busy-wait for a very precise timing that doesn't accumulate error:
void setup(){
startCPUCounter(1.0/1000000); // 1us cycle
}
void loop(){
noInterrupts(); // if you don't do this, it will be a bit off some times, but error won't accumulate.
for (int i = 0; i< 10000;i++){
doTheThingThatNeedsToHappenAtVeryPreciseAndShortIntervals();
myDelay();
}
interrupts();
}
It is the trick I ended up using in the oscilloscope project to get the oscilloscope here to go very fast even while handling multiple channels and checking for triggering conditions. Only there I'm using Timer3 and fiddle with prescalers to get to higher waiting intervals when necessary. Obvious in hindsight I did feel really clever about this.
The good thing about all this nonsense, is that the counter will keep track of time while you are doing something else, so error never accumulates.
hy dbuezas, thank you for your answer!! i'm building in my caravan a battery charger with a 15v powersupply were i work on the feedback regulation to create the target charge stages. i find lgt8f very powerfull due to integrated dac and Differential Amplifier with 32x gain: with this chip i don't need other external components, and it works very well!! unfortunatly i need to comunicate with the central unit using a bitbang protocol, https://www.pjon.org/ the same that i'm using to connect all devices in my caravan, but i'm not able to make lgt8f compatible with the protocol becouse of different bits lenght. in coming days i'll try new LaZsolt libraries. i hope to succede! than you again
great job @LaZsolt!!!!!! much better than before ;-)
It's about time we make a new release including all these improvements from @Lazsolt et al.
@XGIACOMO I am also happy with the better result.
@dbuezas
I am still working on a clock tick precise verison of delayMicroseconds()
, but most of the case not needed clock tick tight precisity. But on lower freq, like 1 MHz, could useful the clock tick precise timing.
This task is more complex than I thought at first and code is became complex too. (I found posts about to correcting the delayMicroseconds()
, older than 15 years.)
I have ideas, half made codes, and a testing code, but none of them are good enough to publish yet.
@LaZsolt that's awesome! Should I wait then?
@dbuezas only few days.
I think need to discuss about delayMicroseconds()
code, before we will reject it. ;)
@LaZsolt : Since you're still working on this, just throwing an idea that is bugging me since I discovered LGT8Fx boards : do you think it would be possible to make clock speed software defined so it would be possible to change the speed of the board at runtime and have delay(), delayMicroseconds(), millis(), micros(), Serial(), I2C, SPI timings etc etc to adapt themselves at runtime too ?
@SuperUserNameMan
do you think it would be possible to make clock speed software defined ... ?
I may can create a source for the case of runtime variable clock speeds, but the delay count calculations became more complex, so short timing will be more inaccurate. Better idea is, when caller will calculate before calling delayMicroseconds()
or duplicate the bit-banging routines for two (or more) different clock speeds.
delay()
is clock speed independent now, because it calling micros()
which reads the timer. If timer set is correct, then delay()
will measure the correct milliseconds.
@dbuezas
#define delayMicroseconds(us) \
if (__builtin_constant_p(us)) { \
delayMicroseconds_c(us); \
} else { \
delayMicroseconds_v(us); \
}
#define delayMicroseconds_c(us) lgt8fx_delay_cycles((double)us*F_CPU/1000000) // for constant case
void delayMicroseconds_v(unsigned int us) __attribute__ ((noinline)); // for variable case which is same as earlier
More sources in the coming days.
Just finished, but not tested yet: https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F
Watch my new readme. https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F
@LaZsolt : I've seen your code includes a trick to prevent the linker from discarding a function you want to make available for debugging.
I don't know if it it could be useful to you, but if you want to protect some of your functions from compiler optimization, you can encapsulate them this way :
#pragma GCC push_options
#pragma GCC optimize ("keep-static-functions")
static void foo( int a )
{
// code i want to protect from compiler optimization
}
#pragma GCC pop_options
You can also specify a level of optimization O0
, O1
, O2
, O3
, Ofast
, or Os
this way #pragma GCC optimize ("O0", "keep-static-functions")
.
More info here :
Finished, tested.
An interesting trick:
This code will wait x*4 - 1 clocks cycle as we know.
__asm__ __volatile__ (
"1: sbiw %0,1 \n\t" // 1 cycle in LGT
" nop \n\t" // 1 cycle
" brne 1b" // 2 cycles ( 1 cycle when counter became 0 )
: "=w" (x) // No outputs, but it is inform the compiler about a modified register
: "0" (x)
);
But this code will wait x*4 clocks cycle exactly.
__asm__ __volatile__ (
"1: sbiw %0,1 \n\t" // 1 cycle in LGT
" breq .+0 \n\t" // 1 cycle ( 2 cycle when counter became 0 )
" brne 1b" // 2 cycles ( 1 cycle when counter became 0 )
: "=w" (x) // No outputs, but it is inform the compiler about a modified register
: "0" (x)
);
This little trick, with same code size, can make a bit readabe delay calculations inside delayMicroseconds_v().
@SuperUserNameMan
do you think it would be possible to make clock speed software defined so it would be possible to change the speed of the board at runtime ... ?
Yes, I found a solution for delayMicroseconds()
. The compiler may can select the best code for different clock speeds. The source not uploaded yet.
Other thing is, the delay calculations and the delaying cycle all written in assembly language, so it will avoid compiler or linker optimizations.
@LaZsolt : sounds great !
When I tested, the compiler handled the different speeds.
namespace CPU32 {
#undef F_CPU
#define F_CPU 32000000
#include "delayMicroseconds.h"
}
namespace CPU16 {
#undef F_CPU
#define F_CPU 16000000
#include "delayMicroseconds.h"
}
namespace CPU4 {
#undef F_CPU
#define F_CPU 4000000
#include "delayMicroseconds.h"
}
But after I put the final source into the core (without namespace) I getting redefinition errors in every namespaces.
Oh ! I've just noticed I misread your message in my first answer that i've just deleted.
When I tested, the compiler handled the different speeds. But after I put the final source into the core (without namespace) I getting redefinition errors in every namespaces.
So the namespace trick works if the delayMicroseconds() function is stored into a separate header. If that's jsut a matter of copy/pasting, I don't think it's a problem.
Your namespace trick is more elegant than what I proposed in the answer I deleted.
Thanks for the idea !
A very new delayMicroseconds() are finished and tested again. This version has better accuracy than before. The lowest frequency for clock tick accurate delay is:
You may use it with different clock frequencies in one source. This example shows how to use it on the menu selected and the other frequencies together:
#define FCPUSAVE F_CPU
namespace CPU32 {
#undef F_CPU
#define F_CPU 32000000
#include "delayus.h"
}
namespace CPU1 {
#undef F_CPU
#define F_CPU 1000000
#include "delayus.h"
}
#undef F_CPU
#define F_CPU FCPUSAVE
void setup() {
// put your setup code here, to run once:
}
void loop() {
delayMicroseconds(3); // Default speed from Arduino IDE
// set speed to 32 MHz
CPU32 :: delayMicroseconds(3); // 32 MHz speed
// set speed to 1 MHz
CPU1 :: delayMicroseconds(3); // 1 MHz speed
// set speed back to the basic speed
}
The source is here: https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F How to install:
delayMicroseconds()
definition in Arduino.h in your directory C:\Users\__yourusername__\AppData\Local\Arduino15\packages\LGT8fx Boards\hardware\avr\1.0.5\cores\lgt8f\
with all my Arduino.h source.delayMicroseconds()
function.
The previous attept was failed because I wanted to define delayMicroseconds()
by a macro, but the compiler found it too complex when I tried to compile in different namespaces. Perhaps I run into a compiler issue. The macro which was bad I mentioned earlier: https://github.com/dbuezas/lgt8fx/issues/18#issuecomment-721539223
@LaZsolt sorry to hijack this issue. I am trying to port waiting functions from the AVR Transistortester (in wait.S
) to the LGT8F328P. Do you know if these figures are still valid with the LGT8F328P?
@DurandA : according to chinese datasheet, rcall needs 1 cycle, and ret 2 cycles.
See page 252 : https://raw.githubusercontent.com/dbuezas/lgt8fx/master/docs/LGT8FX8P_databook_v1.0.4.ch.pdf
@SuperUserNameMan Thank you very much. I missed it in the translated datasheet.
In wiring.c delayMicroseconds() function using SBIW and BNRE instructions for timing. In AVR MCUs it takes usually 4 clock cycles while in LGT MCUs takes usually 3 clock cycles. Therefore this function delay less than expected. I found a better version in a chinese site. The author corrected only 16 and 32 MHz cases. https://www.geek-workshop.com/thread-38486-1-1.html