Open focalintent opened 7 years ago
yeah i also added a delaynanoseconds now so it isnt all nops. but im having difficulties telling the compiler to not unroll the loop when the loopcount is low, and i want to avoid using asm or macros. feels cleaner to me that way. so currently it expands into nops till the loop doesnt get unrolled anymore
i didnt mean to close that D:
Yeah - I went the asm route because I didn't want to be at the mercy of the compiler deciding to optimize things, or a compiler rev doing something like changing the function prolog, etc...
thats why i added all that inline and always inline crap, to force the compiler to do stuff. sadly i cant find anything to tell the compiler to not unroll a loop, even if only runs once. but i guess i really have to resort to asm for the loop. i guess i will copy your syntax. couldnt wrap my head around the syntax with inputs and outputs yet.
as you wrote in your code the brne uses 2 cycles when branching and 1 when not. in my code i will add an additional nop after the loop, so its like the brne also uses 2 cycles when exiting. makes the cycle count a certain amount of loops will have easier to calculate
That’s actually accounted for by the one cycle to load the count into the register, so no need to nop afterwards.
Also - a lot of those things are merely hints to the compiler, and generally not good to rely on, as the compiler can change its mind about what it does and how. And if the compiler changes the function prolog? There aren’t many flags you can use to control what it does there. (I used to work on a decompiler - the things compilers do to your code can be baffling sometimes :)
A good question to ask yourself is do you want to get this working just this one time for this one version of the compiler, or do you want to have it be more durable -- not requiring any maintenance even if the compiler radically changes how it generates/optimizes code. Most projects start with"just make it work for now"; some also get more ambitious about portability across compiler revisions.
im using the attribute always_inline https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute and i guess that will not change with new compiler versions. the only thing that might most likely change is the current solution for the loop thing. still searching for a way to force the compiler to do what i want without putting assembly in there. if i want this library to also work on other architectures (like the arduino dues or esps) i dont want to write everything in assembly all the time.
currently the delaynanoseconds isnt finished and will most likely not do what it should, except for values up to ~3000, cause it just creates nops till then and then switches to the loop
for the "things compilers do to your code can be baffling", before using the attribute always inline, the compiler put out shit like this:
0x90 nop
0x92 ret
.......
0x1xx call 0x90
0x1xx nop
0x1xx nop
0x1xx nop
for nop<4> XD
Take a look at what I did in here for delaycycles to get a nop template that doesn't simply unroll into a set of nops - it basically sets up a loop in asm to spin the cpu for the number or cycles you want:
https://github.com/FastLED/FastLED/blob/master/fastled_delay.h