MCUdude / MCUdude_corefiles

Subtree for Arduino core files used with MightyCore, MegaCore, MiniCore and MajorCore
19 stars 4 forks source link

Replace macros with "proper" functions #23

Closed MCUdude closed 3 years ago

MCUdude commented 4 years ago

The macros below should be replaced by proper functions, or perhaps template functions for ultimate flexibility.

#define min(a,b) ((a)<(b)?(a):(b))
#define max(a,b) ((a)>(b)?(a):(b))
#define abs(x) ((x)>0?(x):-(x))
#define constrain(amt,low,high) ((amt)<(low)?(low):((amt)>(high)?(high):(amt)))
#define round(x)     ((x)>=0?(long)((x)+0.5):(long)((x)-0.5))
#define sq(x) ((x)*(x))

More info about why these macros are terrible and should never have been added in the first place can be found here: https://forum.arduino.cc/index.php?topic=84364.0

A few solutions are provided here: https://github.com/arduino/Arduino/issues/2069

@SpenceKonde this is relevant for your cores as well

MCUdude commented 3 years ago

The abs macro bit me in the rear today. I was incrementing a variable inside abs, and it ended up incrementing twice. I'll have to do something about this.

@nerdralph suggested to use __builtin_abs() for the abs macro. Do you know other replacements that will work and isn't performance hogs?

#define abs(x) __builtin_abs(x)

Oh, I found these examples of the Arduino forums as well. Not sure if they are any "safer", faster or slower though:

#define max(a,b) \
       ({ typeof (a) _a = (a); \
           typeof (b) _b = (b); \
         _a > _b ? _a : _b; })

#define min(a,b) \
       ({ typeof (a) _a = (a); \
           typeof (b) _b = (b); \
         _a < _b ? _a : _b; })
MCUdude commented 3 years ago

Apparently, The abs macro below occupies less flash than __builtin_abs:

#define abs(a) \
       ({ typeof (a) _a = (a); \
         _a > 0 ? _a : -_a; })
nerdralph commented 3 years ago

It always uses less flash, or only when the arguments are compile-time constants?

On Fri, Oct 30, 2020 at 2:30 PM Hans notifications@github.com wrote:

Apparently, The abs macro below occupies less flash than __builtin_abs:

define newest_abs(a) \

   ({ typeof (a) _a = (a); \
     _a > 0 ? _a : -_a; })

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MCUdude/MCUdude_corefiles/issues/23#issuecomment-719691838, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKNZ6UVUO5SGYWKB7DOENDSNLZ4ZANCNFSM4RAKXSTQ .

MCUdude commented 3 years ago

It turns out I was wrong. It's the other way around! __builtin_abs occupies less flash, but only when LTO is disabled. Do you know where I can find the source code for __builtin_abs ?

I tested it with the program below:

#define new_abs(a) \
       ({ typeof (a) _a = (a); \
         _a > 0 ? _a : -_a; })

#define new_abs1(x) __builtin_abs(x)

int32_t val;

void setup() {
  pinMode(0, OUTPUT);
}

void loop() {
  digitalWrite(0, HIGH);
  delay(new_abs1(val));
  digitalWrite(0, LOW); 
  delay(new_abs1(val));
  val--;
}

Compiled size for ATmega1284P when LTO is off and new_abs is used: 1252 bytes (13 bytes of RAM)
Compiled size for ATmega1284P when LTO is off and new_abs1 is used: 1236 bytes (13 bytes of RAM)

SpenceKonde commented 3 years ago

According to the docs, it's a "builtin function of gcc"

As far as I can see, there isnt source for it? Though you can always compile it and examine the assembly listing if you care... I also suspect it matters how many times you call it...

Also.... I think you want to declare val volatile for that test, instead of changing it like you are doing...

MCUdude commented 3 years ago

As far as I can see, there isnt source for it? Though you can always compile it and examine the assembly listing if you care... I also suspect it matters how many times you call it...

Couldn't find it either...

Also.... I think you want to declare val volatile for that test, instead of changing it like you are doing...

The program works regardless. Care to explain why you think it's necessary in this case?

Anyways, I've come up with these. They are still macros, but they all work as expected:

#define abs(x)       __builtin_abs(x)
#define sq(x)        ({ typeof (x) _x = (x); _x * _x; })
#define min(a,b)     ({ typeof (a) _a = (a); typeof (b) _b = (b); _a < _b ? _a : _b;  })
#define max(a,b)     ({ typeof (a) _a = (a); typeof (b) _b = (b); _a > _b ? _a : _b;  })
#define round(x)     ({ typeof (x) _x = (x);  _x >= 0 ? (long)x + 0.5 : (long)x - 0.5 })
#define radians(deg) ((deg) * DEG_TO_RAD)
#define degrees(rad) ((rad) * RAD_TO_DEG)
#define constrain(x,low,high)   ({ \
  typeof (x) _x = (x);             \
  typeof (low) _l = (l);           \
  typeof (high) _h = (h);          \
  _x < _l ? _l : _x > _h ? _h :_x })

BTW it's very strange that this hasn't been fixed in any of the official Arduino cores yet. The fix is so simple, and the current implementation brings no benefit compared to the "new" one.

On the Arduino reference page they have even documented its "unexpected behavior":

Because of the way the abs() function is implemented, avoid using other functions inside the brackets, it may lead to incorrect results. abs(a++); // avoid this - yields incorrect results // use this instead: abs(a); a++; // keep other math outside the function

SpenceKonde commented 3 years ago

IIRC the rules that the C standard defines that dictate what the optimizer is allowed to do can occasionally get weird and perverse... because even though, in reality, when you roll an integer over, it wraps around, the C standard doesn't guarantee that, and the compiler is allowed to shit on that... I can't find where I put it, but I remember being so flabbergasted by when I encountered it once that I saved it...I thought I put it in github, but can't find it.

It was something like

byte i = 1; while (i++) {...}

and under certain circumstances (usually not, but occasionally), the compiler would optimize out the test entirely, because i was always going to be 1 because it's never subtracted from, only added to, and it starts out as 1, so it can never be zero... I was stunned that it was allowed to do that! (IIRC I ran into at least one other wacky unexpected optimization there). And like, it wouldn't do that under normal circumstances, either - make anything slightly more complex, and it wouldn't do that... but in the very simple minimal test case, like one writes for understanding things like flash usage like that...... yeah... wtf...

Since then I always just make things volatile when I want to make sure the compiler doesn't pull some wacky optimization out of the twilight zone to throw my test off the mark.

For example, in your test, new_abs1(val) is only calculated once per iteration of loop... this takes the same flash as your version of loop:

void loop() {
  digitalWrite(0, HIGH);
  int32_t test=new_abs1(val);
  delay(test);
  digitalWrite(0, LOW); 
  delay(test);
  val--;
}

I don't think that was what you meant was it?

On a t1614, just quickly... As you wrote, or as I noted above, 872. (ie, it's only calculated once) volatile val as you have it, 914 (ie, calculated twice) volatile val as I wrote, 884 (ie, calculated once) volatile val, as I wrote, but switched to the new_abs instead of new_abs1, 924 (oof!) volatile val, as you have it, switched to new_abs: 1010 (dear lord!)

Why is new_abs made of lose and fail? I may be wrong, but I am suspicious that typeof() is copying the volatility too, because this:

void loop() {
  digitalWrite(0, HIGH);
  int32_t test=val; //can't optimize this away
  delay(new_abs(test));
  digitalWrite(0, LOW); 
  test=val; // or this, so two accesses
  delay(new_abs(test)); //and have to recalculate - compiler can't assume what val, hence test is
  val--;
}

or

void loop() {
  digitalWrite(0, HIGH);
  delay(new_abs((int32_t )val));
  digitalWrite(0, LOW); 
  delay(new_abs((int32_t )val));
  val--;
}

both compile to 906....

And this one:

void loop() {
  digitalWrite(0, HIGH);
  delay(abs(val));
  digitalWrite(0, LOW); 
  delay(abs(val));
  val--;
}

with volatile val, 914, 884 if you use a temp variable like I did above...

But yeah - always declare things volatile if you want to make sure the compiler isn't taking shortcuts you don't want it to... but I guess you need to cast away the volatility right when you pass it to a macro, heh... I wasn't expecting that bit...

SpenceKonde commented 3 years ago

Oh, wait, the abs() that megaTinyCore is using isn't the same as the one the official core uses, it looks like.... lol... okay, I'm done with this for now, have other things I need to get done this weekend, but I can't find any definition of abs() in megaTinyCore, so it's using the one from stdlib.h... which is __builtin_abs()

But yeah - in this case I think I agree with your conclusion: __builtin_abs() it is, but your method doesn't actually prove it

nerdralph commented 3 years ago

gcc will sometimes break valid code if you have integer overflow anywhere in your program unless you use -fwrapv.

On Sun, Nov 1, 2020, 17:09 Spence Konde (aka Dr. Azzy) < notifications@github.com> wrote:

IIRC the rules that the C standard defines that dictate what the optimizer is allowed to do can occasionally get weird and perverse... because even though, in reality, when you roll an integer over, it wraps around, the C standard doesn't guarantee that, and the compiler is allowed to shit on that... I can't find where I put it, but I remember being so flabbergasted by when I encountered it once that I saved it...I thought I put it in github, but can't find it.

It was something like

byte i = 1; while (i++) {...}

and under certain circumstances (usually not, but occasionally), the compiler would optimize out the test entirely, because i was always going to be 1 because it's never subtracted from, only added to, and it starts out as 1, so it can never be zero... I was stunned that it was allowed to do that! (IIRC I ran into at least one other wacky unexpected optimization there). And like, it wouldn't do that under normal circumstances, either - make anything slightly more complex, and it wouldn't do that... but in the very simple minimal test case, like one writes for understanding things like flash usage like that...... yeah... wtf...

Since then I always just make things volatile when I want to make sure the compiler doesn't pull some wacky optimization out of the twilight zone to throw my test off the mark.

For example, in your test, new_abs1(val) is only calculated once per iteration of loop... this takes the same flash as your version of loop:

void loop() { digitalWrite(0, HIGH); int32_t test=new_abs1(val); delay(test); digitalWrite(0, LOW); delay(test); val--; }

I don't think that was what you meant was it?

On a t1614, just quickly... As you wrote, or as I noted above, 872. (ie, it's only calculated once) volatile val as you have it, 914 (ie, calculated twice) volatile val as I wrote, 884 (ie, calculated once) volatile val, as I wrote, but switched to the new_abs instead of new_abs1, 924 (oof!) volatile val, as you have it, switched to new_abs: 1010 (dear lord!)

Why is new_abs made of lose and fail? I may be wrong, but I am suspicious that typeof() is copying the volatility too, because this:

void loop() { digitalWrite(0, HIGH); int32_t test=val; //can't optimize this away delay(new_abs(test)); digitalWrite(0, LOW); test=val; // or this, so two accesses delay(new_abs(test)); //and have to recalculate - compiler can't assume what val, hence test is val--; }

or

void loop() { digitalWrite(0, HIGH); delay(new_abs((int32_t )val)); digitalWrite(0, LOW); delay(new_abs((int32_t )val)); val--; }

both compile to 906....

And this one:

void loop() { digitalWrite(0, HIGH); delay(abs(val)); digitalWrite(0, LOW); delay(abs(val)); val--; }

with volatile val, 914, 884 if you use a temp variable like I did above...

But yeah - always declare things volatile if you want to make sure the compiler isn't taking shortcuts you don't want it to... but I guess you need to cast away the volatility right when you pass it to a macro, heh... I wasn't expecting that bit...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MCUdude/MCUdude_corefiles/issues/23#issuecomment-720151817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKNZ6RBNS3LSIRGOYWP5CLSNXFBRANCNFSM4RAKXSTQ .

SpenceKonde commented 3 years ago

Hmm - should we be using that flag for our cores, or does it cause other problems?

On Sun, Nov 1, 2020 at 4:37 PM Ralph Doncaster notifications@github.com wrote:

gcc will sometimes break valid code if you have integer overflow anywhere in your program unless you use -fwrapv.

On Sun, Nov 1, 2020, 17:09 Spence Konde (aka Dr. Azzy) < notifications@github.com> wrote:

IIRC the rules that the C standard defines that dictate what the optimizer is allowed to do can occasionally get weird and perverse... because even though, in reality, when you roll an integer over, it wraps around, the C standard doesn't guarantee that, and the compiler is allowed to shit on that... I can't find where I put it, but I remember being so flabbergasted by when I encountered it once that I saved it...I thought I put it in github, but can't find it.

It was something like

byte i = 1; while (i++) {...}

and under certain circumstances (usually not, but occasionally), the compiler would optimize out the test entirely, because i was always going to be 1 because it's never subtracted from, only added to, and it starts out as 1, so it can never be zero... I was stunned that it was allowed to do that! (IIRC I ran into at least one other wacky unexpected optimization there). And like, it wouldn't do that under normal circumstances, either

make anything slightly more complex, and it wouldn't do that... but in the very simple minimal test case, like one writes for understanding things like flash usage like that...... yeah... wtf...

Since then I always just make things volatile when I want to make sure the compiler doesn't pull some wacky optimization out of the twilight zone to throw my test off the mark.

For example, in your test, new_abs1(val) is only calculated once per iteration of loop... this takes the same flash as your version of loop:

void loop() { digitalWrite(0, HIGH); int32_t test=new_abs1(val); delay(test); digitalWrite(0, LOW); delay(test); val--; }

I don't think that was what you meant was it?

On a t1614, just quickly... As you wrote, or as I noted above, 872. (ie, it's only calculated once) volatile val as you have it, 914 (ie, calculated twice) volatile val as I wrote, 884 (ie, calculated once) volatile val, as I wrote, but switched to the new_abs instead of new_abs1, 924 (oof!) volatile val, as you have it, switched to new_abs: 1010 (dear lord!)

Why is new_abs made of lose and fail? I may be wrong, but I am suspicious that typeof() is copying the volatility too, because this:

void loop() { digitalWrite(0, HIGH); int32_t test=val; //can't optimize this away delay(new_abs(test)); digitalWrite(0, LOW); test=val; // or this, so two accesses delay(new_abs(test)); //and have to recalculate - compiler can't assume what val, hence test is val--; }

or

void loop() { digitalWrite(0, HIGH); delay(new_abs((int32_t )val)); digitalWrite(0, LOW); delay(new_abs((int32_t )val)); val--; }

both compile to 906....

And this one:

void loop() { digitalWrite(0, HIGH); delay(abs(val)); digitalWrite(0, LOW); delay(abs(val)); val--; }

with volatile val, 914, 884 if you use a temp variable like I did above...

But yeah - always declare things volatile if you want to make sure the compiler isn't taking shortcuts you don't want it to... but I guess you need to cast away the volatility right when you pass it to a macro, heh... I wasn't expecting that bit...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/MCUdude/MCUdude_corefiles/issues/23#issuecomment-720151817 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABKNZ6RBNS3LSIRGOYWP5CLSNXFBRANCNFSM4RAKXSTQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MCUdude/MCUdude_corefiles/issues/23#issuecomment-720155564, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTXEWYRDUBLOSNRK5FB36TSNXIINANCNFSM4RAKXSTQ .

--


Spence Konde Azzy’S Electronics

New products! Check them out at tindie.com/stores/DrAzzy GitHub: github.com/SpenceKonde ATTinyCore https://github.com/SpenceKonde/ATTinyCore: Arduino support for all pre-2016 tinyAVR with >2k flash! megaTinyCore https://github.com/SpenceKonde/megaTinyCore: Arduino support for all post-2016 tinyAVR parts! DxCore https://github.com/SpenceKonde/DxCore: Arduino support for the AVR Dx-series parts, the latest and greatest from Microchip! Contact: spencekonde@gmail.com

nerdralph commented 3 years ago

Hmm - should we be using that flag for our cores, or does it cause other problems?

I only recently became aware of how stupid gcc can be with integer overflow, so I haven't done much testing with -fwrapv. https://www.avrfreaks.net/forum/failure-truncate-16bits-when-flto