Closed henrygab closed 2 years ago
Hi, thanks - I'm a bit confused though. Can you provide any code where the C++ 17 standard support makes relevant differences? I'm only competent wrt straight C stuff and inline assembly - and that may be giving me too much credit..., not all that wack shit they added in C++. You could say I'm not a class-y programmer ;) At the start you say that it makes it easier to put constants into progmem. But then at the bottom you say it generates identical hex files, which would mean that it has the same behavior as before, we're just shuffling deck chairs, in which case I shouldn't merge it.).
Can you in general make a bit clearner what cases this would be expected to change behavior in?
(as an aside, I don't think I would ever describe a word I have said to the compiler as "polite". Like I said, I'm not a classy programmer)
OK, I can definitely try. It's late, so I'll likely have some mistakes below....
First, my mention of getting the "same" results above related to compilation of existing examples both with and without this change. In that case, you really want the binaries to be identical ... it means the compilation had no regressions. Code that is written to use C++14 or C++17 features would not compile without the change, so doesn't have that baseline.
I will use expandable/collapsible sections to make it easier to parse my attempt at this, trying to focus on some features of particular interest to PROGMEM optimization.
Thanks for that - If I'm being honest, most of those points went over my head... it's becoming increasingly clear (and initially was not) that while I write a significant amount of code that is compiled as C++... basically the only thing I do that isn't compatible with straight C is operator overloading. I struggle with classes and namespaces, and templates are black magic to me. But it sounds like, for those who dabble in such sorcery (sourcery?) it would be a valuable addition.
I will put those changes into megaTinyCore first (there is a release I need to get out this weekend for megaTinyCore; I've improved the upload speed by something like a factor of 8-to-1 when using one of the programmer options... and I want to get that into the hands of users.
I put C++17 mode into megaTinyCore so I could run the CI. There are indeed a few cases where it resulted in a difference in the compiled code - something about the servo library gets handled ever so slightly more efficiently (I mean, I looked at the listing, and saw where it was and all, but it was deep enough in that I couldn't figure out what the heck they were doing.). It uses 2 words less there, and 1 word more on all the tinyNeoPixel ones. But yeah, this needs testing with CI running here before I can puit something like that in. We will see if anyone squawks about problems on megaTinyCore, as well...
[admittedly, low priority, but fascinating!]
There are indeed a few cases where it resulted in a difference in the compiled code - something about the servo library gets handled ever so slightly more efficiently (I mean, I looked at the listing, and saw where it was and all, but it was deep enough in that I couldn't figure out what the heck they were doing.). It uses 2 words less there, and 1 word more on all the tinyNeoPixel ones.
I admit to extreme curiosity. I tried looking through the action logs in megaTinyCore to find how you determined the size changed (due to build changes). However, I didn't see anything that saved the build artifacts from the automated builds.
Did you do this manually somehow? If you have a ZIP of the old vs. new build artifacts, I'd love to take a look. Or, if you have specific example sketches that you could point to, I could build locally?
https://github.com/SpenceKonde/megaTinyCore/pull/424
That's how I figured out that there were differences. Downloaded full report CSV link there.
Openeed in sublime text, first line copied to new fil searched with regex to grab I think the names of the exam,ple sketches and the find all cut paste into another scratch document, find replace the <br/>\n
with ,,,, (each sketch has 4columns.
That became first line of original CSV fie and the second scratch document I think needed some trivial cleanup with regex, tjhen became second. Saved,. opened in excel loooked for the changes. Then I manually compiled one of each of the two with and without change to scompare.
the examples for the tinyNeoPixel library gain 1 instruction word in size. The exa,,mples for Servo lose 2.
had to use winmerge on the assembly listings to find the difference., but ofc very annoying a bunch of addresses were off by 2 or 4
Thinking about it, if I had to do that again, I'd make a copy of them, then regex away all the line addresses. and jmp targets and compare those, , then go back and search for the scenery I now knew was around tjhe site with the differences.
And, by the way, over in megatinycore, this caused https://github.com/SpenceKonde/megaTinyCore/issues/528
Thanks for update, I find that G++ change interesting.
Yes, both C++14 and C++17 added some new allocation and deallocation functions.
Moving data into PROGMEM requires that the data be evaluated at compilation time, so the data can be placed in ROM
@henrygab I dont follow your remark regarding the PROGMEM
and constexpr
issue. constexpr
simply says that a statement/function can be evaluated at compile time and therefore doesnt have any representation in binarny (compiled) form (placed in RAM or ROM) corresponding to the statement/function logic. For example constexpr
function is reduced to a single const
value in a compiled code, and the value need not to have any memory associated with it (that is prvalue).
EDIT: I probably grasp your idea by paraphrasing: make as many const
data as possibile and use constexpr
funcs/statements to handle them, to move as many logic as possibile to compile time to reduce binarny footprint.
Do you know what the value is being used for __STDCPP_DEFAULT_NEW_ALIGNMENT__ in megatinycore?
Not a clue!
Do you know what the value is being used for
__STDCPP_DEFAULT_NEW_ALIGNMENT__
in megatinycore?
__STDCPP_DEFAULT_NEW_ALIGNMENT__
= alignof(max_align_t)
= 1 and sizeof(max_align_t)
= 12
In other words this MCU doesn't have any alignment requirements.
this MCU doesn't have any alignment requirements.
Well, no it wouldn't - how does alignof(max_align_t) == 1
and sizeof(max_align_t) == 12
tell you that? cppreference entry doesn't really help me on that,
I can imagine user code that would want overaligned structures in memory. It's harder to imagine cases where that would be required for objects or anything allocated dynamically... The examples I'm thinking of are all where there's a buffer that would have been statically allocated, that gets accessed with some unholy snippet of inline assembly... (which is probably located in an ISR, quite possibly a naked one, otherwise you wouldn't be looking at such desperate measures to make it run a fraction of a microsecond faster). I'm not sure if there are ways to improve performance significantly that require such overalignment and don't involve inline assembly.
You are right. On AVR, likely no such thing is ever going to be required.
Sorry, maybe I was not so precise saying any, but I didn't expect this sentence will be taken so seriously ;)
In this context I meant memory alignment for any (basic, primitive) object type as defined for C malloc()
here. Fundamental alignment as specified by alignof(max_align_t)
= 1, implies a basic object doesn't require any alignment for this MCU (e.g. alignof(int)
= alignof(void*)
= alignof(size_t)
= 1). I don't exclude there could be some (system) objects that would require this (like DMA for example).
As an example: Intel has quite relaxed requirements for memory alignment of basic types (integers, pointers), but unaligned access (as far as I recall it) is less efficient than aligned one. An unaligned access spans across single memory access cycle therefore a CPU needs to lock the bus to guarantee atomicity. For this efficiency reason alignof(uint32_t)
is 4 not 1.
On the other hand ESP8266 (Xtensa) has very strict alignment requirements and raises LoadStoreAlignmentCause
exception is case of unaligned access (I painfully found out about this recently by this bug report).
BTW I wonder if AVR MCU guarantees memory access atomicity for innate multi-byte objects (e.g. uint16_t
). For example in case of storing 2-bytes integer (address) in the memory is it possible to be ISR reported after storing the 1st byte but before the 2nd one.
AVR does not guarantee atomic access; you have to briefly disable interrupts while reading a variable written by an ISR or writing one read by an ISR, if the code could be called when the ISR is enabled. This was one of the many bugs that could cause time travel in early versions of this core (brief jaunts a millisecond or so backwards or forwards in time, returning to the present on the next call. These have all been fixed, and while cleaning up code in wiring.c around them, I wrote about it yesterday.
DMA is not present on any AVRs that are not branded as XMega.
Yeah - I think we're all on the same page with what the actual alignment requirements of the MCU are - I've gone as far as writing out theoretical ISRs that could perform lightning fast operations with a 256 entry lookup table but required it to be aligned to 256 bytes. (28-30 clock cycles including all interrupt overhead) for say, outputting a sinewave on the DAC at high resolution, where you're controlling the rate at which interrupts are fired to set the frequency, while also doing something else. If the high byte of the address includes the interrupt flag you need to clear, that saves one, and if the table is in flash instead of ram, that adds one clock. 400ksps that the tiny 1-series datasheet claims would give like, 60% of the CPU time entering, in, or exiting the ISR at 20 MHz - painful but viable - while dropping back to 140 ksps like the Dx claims as max leaves the fraction of time spent in the ISR looking almost sane.
This was one of the many bugs that could cause time travel in early versions of this core (brief jaunts a millisecond or so backwards or forwards in time, returning to the present on the next call.
I'd be interested to get to know more about it if possible.
I checked in changes just now over on megaTinyCore's wiring.c.
Before, as well as in and around, it, there is a ton of stuff about millis() micros() and delay() implementation pitfalls and considerations. The biggest problem with micros isn't that so much as the fact that you need to do division, but division is way to slow. Avoid division like the plague on 8-bit AVRs.
Changes will be in 1.3.7 DxCore release too. Platform.txt is updated
This is in 2.0.0-dev branch and that is now available for public testing.
Problem: RAM and PROGMEM (click to expand)
ATTiny devices are extremely constrained in RAM. This makes it critical to get as much data into `PROGMEM` as possible. However, `PROGMEM` requires use of special accessor functions to get data into RAM for immediate use (else user is forced to write tons more code for all use of the data). Moving data into `PROGMEM` requires that the data be evaluated at compilation time, so the data can be placed in ROM. Thus, the more robust support the compiler provides for `constexpr`, the more data that can be determined at compile time, thus saving precious RAM.Why C++17Trivial changes to enable (click to expand)
C++17 greatly improves not only `constexpr` capabilities, but also relaxes template restrictions, allowing simpler metaprogramming that is sometimes necessary to (politely) explain to the compiler that the data really and truly is a compile-time constant.Platform.txt change summary (click to expand)
`avr-gcc` doesn't support `-std=c17`, but it does not matter. Per [docs](https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language), `-std=c11` is treated identically to `-std=c17`. So, no update to `CFLAGS`. > A version with corrections integrated was prepared in 2017 and published in 2018 as ISO/IEC 9899:2018; it is known as C17 and is supported with -std=c17 or -std=iso9899:2017; the corrections are also applied with -std=c11, and the only difference between the options is the value of __STDC_VERSION__. `CPPFLAGS` has two changes: First, the change from `-std=gnu11` to `-std=c++17`, for the stated reasons. Second is the addition of `-fno_sized_deallocation`, since it's not a relevant feature for ATTiny and addresses the following compiler warning: ``` ...\hardware\avr\1.5.2\cores\tiny\new.cpp:29:6: warning: the program should also define 'void operator delete(void*, unsigned int)' [-Wsized-deallocation] void operator delete(void * ptr) { ^~~~~~~~ ```The good news: Initial compilations for ATTiny85 appear to generate identical assembly listings and .HEX files. Please consider updating to support C++17 ... it will make seeding data in `PROGMEM` much easier. Thank you!