SpenceKonde / megaTinyCore

Arduino core for the tinyAVR 0/1/2-series - Ones's digit 2,4,5,7 (pincount, 8,14,20,24), tens digit 0, 1, or 2 (featureset), preceded by flash in kb. Library maintainers: porting help available!
Other
544 stars 141 forks source link

delayMicroseconds on 10mhz not accurate #981

Closed Unreal-Dan closed 11 months ago

Unreal-Dan commented 1 year ago

I have a very robust system that uses blinking leds to transmit data with binary, the delays are in thousands of microseconds and when I switch to 10mhz I observed all of the timings being delivered as 2x. When I sent 1/2 of the timings the system worked.

When I looked into delayMicroseconds I found that removing this line for 10mhz solved the issue:

https://github.com/SpenceKonde/megaTinyCore/blob/master/megaavr/cores/megatinycore/wiring.c#L1233C1-L1243C1

us = us << 1; // x2 us, = 2 cycles

Simply removing it entirely solved the problem and the 10mhz delayMicroseconds was waiting for the correct time again.

I'm not sure if maybe I am missing something but this is the behaviour I noticed, let me know if there is anything I can do

ps. spence if you added me on discord send me a message and say hi, I had a few people add me and I said hi but they didn't reply.

SpenceKonde commented 1 year ago

Yeah it looks like the 20 MHz code was copy/pasted and not adjusted for 10 MHz

and I added you but you didn;t accept.

Unreal-Dan commented 1 year ago

perhaps you added the wrong id or name, discord changed how it does names right around the same time you were attempting to add me. Before that I was Daniel # 8135 but after that they removed the numbers and now I'm "daniel8135".

(to clarify, two other people added me around the same time but one never replied -- assumed that was you)

Glad I could find this issue

SpenceKonde commented 11 months ago

This bug does not reproduce in the core itself, only within the context of this users' code which has picked and chosen pieces of the core. It is not surprising to the maintainer that this approach was unsuccessful, the core works much better as a completed whole. Considerable discussion with OP indicated that the issue was not with delayMicroseconds() but with micros(), and occurred in a context where pieces of the core had been taken out to provide functions to a separate tool, and it appears that his code is doing 16 MHz math when the chip is clocked at 10 MHz resulting in the observed difference. Users are advised to use the core with the Arduino IDE or other third party IDE, and not to use the core instead as construction material for some other project - there are an incredible number of internal dependencies as well as distinctions between codepaths that would surprise most people (delay, millis, and delayMicroseconds all use different methods to track the time. delay uses micros internally, so it is influenced by sub-millisecond timekeeping artifacts (which are believed to be negligible now) as well as the call overhead of micros(), which may not be negligible, despite extreme effort being devoted towards that goal (It has the majority of inline asm in the core, for example. to squeeze every cycle we can out of it, because by definition, people care about the speed the function they use to measure microseconds runs at and have complained when this took longer in the past.

this is particularly important for TCB, where the math is usually implemented in hand optimized assembly to minimize this overhead, as the compiler does a particularly bad job with shift+sum operations on multibyte datatypes (if m is a uint32_t, m += m >> 2 - m>>5 + m >>6 - m >> 7, the compiler is often not smart enough to do this in the obvious way (ie, to make a copy, rightshift twice and add, rightshift 3 more times subtract, rightshift add rightshift subtract - around 30 clocks. Often it will NOT REUSE THE INTERMEDIATES = ie. it will make a copy of the original, then make a copy of that. rightshift it twice and add it to one of the originals. then recopy the pristine copy, rightshift it 5 times and subtract, repeat. This would take could take as long as 100 clock cycles which while pathetic, is hst) This is dependant on the timer, clock speed. It is fastest for TCBs, and slowest for TCD, but quantitative information is available only for TCA and TCB, and may predate several performance enhancements. millis uses only the millisecond count, without resorting to reading the timer (for non-RTC timers. ofc), so problems there are quite unlikely, especially if a TCB is used. delayMicroseconds() is implemented in 2 radically different ways depending on whether the argument passed to it is constant or not. avrlibc's algorithm is perfect for compile time known cases - ours is wrong for short delays - so we should use that when we can, but it's hopelessly wrong when the argument is not compile time known, so there's a separate codepath to handle that case, They split off from eachother at the very first opportunity: delayMicroseconds is an always_inline stub that calls either _delay_us(constexpr) from delay.h or _delayMicroseconds(variable) supplied by the core. I'm saying all this here to illustrate my point that timekeeping pwm and timers are the hardest part of a core, and they are in no way designed to be separable from the core. Timekeeping and shepherding of timers, broadly speaking, is the hardest part of the core. You are in more contact with the timers than with any other peripheral, and it's in a very visible area of functionality, so it is no surprise that problems were encountered here.