Closed SpenceKonde closed 3 years ago
Any thoughts on this? https://github.com/arduino/ArduinoCore-megaavr/pull/87
It is incompatible with existing libraries, see discussion in https://github.com/SpenceKonde/megaTinyCore/issues/216
Is it though? Because with this patch, I'm able to compile the MFRC522 examples that previously didn't compile in #216. And it makes sense too.
Lost track of this thread - and yeah, it is a problem. It works with the MFRC522, but it breaks with other code, where the library provides a method that takes a const __FlashStringHelper*
and one that takes a const char*
- this is an error because now there are two methods with the same signature.
I can't see a way short of copying the avrlibc pgmspace library and replacing the pgmread macros, which I think would still be less efficient :-/ I think before doing that, it would be good to get an idea of how much worse the performance/flash use is when we just stubbornly stick to the old way.... If we're chasing a tiny reward, it's not a good use of time...
Performance can be tested with just declaring a big array with and without PROGMEM, and a matching volatile buffer in ram, turn on TCB clocked off CLK_PER, copy every value in the array to the buffer, stop TCB and check the count..... that simulates the dominant use case, printing over serial. I'm curious about just how much slower it will be. Reading that assemby is beyond me, but the fact that it's not smart enough to use the lpm Z+ instruction doesn't leave me optimistic about the overall performance.
The flash overhead is only ~60 bytes (if you have any printing of __FlashStringHelpers), with no additional penalty using F() - so the flash usage does not justify additional development effort.
To clarify above waffling on whether it impacts DxCore.... This impacts DxCore for AVR32DA, AVR32DB, AVR32DD, AVR16DD with the same severity as the tinyAVR 0/1/2-series and megaAVR 0-series parts (as well as any future part where AVR_ARCH == 103). It does not impact the AVR64DA, AVR64DB, AVR64DD, (AVR_ARCH==102) AVR128DA, or AVR128DB (AVR_ARCH__ == 104) since only portions of the flash are mapped to the memory address space; if those are being utilized, they should be declared with the MAPPED_PROGMEM attribute (or on platforms other than DxCore, by defining a section starting at that address with a section_start directive passed to the linker coupled with a function attribute specifying that it is to go in the section so defined. If that is done, they're just treated like any variable and the function won't have a clue that it was actually pulling the data from flash instead of ram.
The one thing that's rather annoying here is that one could solve it totally transparently for all cases, except when library authors tried to get clever - in which case it seeems nearly insoluble.
The F() case: for (uint8_t i=0; i < 255; i++) { dest[i]=pgm_read_byte_near(&sourcePGM[i]); }
63a: fc 01 movw r30, r24
63c: e4 58 subi r30, 0x84 ; 132
63e: ff 4f sbci r31, 0xFF ; 255
640: e4 91 lpm r30, Z
642: dc 01 movw r26, r24
644: ae 5f subi r26, 0xFE ; 254
646: b7 4c sbci r27, 0xC7 ; 199
648: ec 93 st X, r30
64a: 01 96 adiw r24, 0x01 ; 1
64c: 8f 3f cpi r24, 0xFF ; 255
64e: 91 05 cpc r25, r1
650: a1 f7 brne .-24
The memory-mapped global variable case: for (uint8_t i=0; i < 255; i++) { dest[i]=sourceRO[i]; }
6c2: 21 91 ld r18, Z+
6c4: dc 01 movw r26, r24
6c6: ae 5f subi r26, 0xFE ; 254
6c8: b7 4c sbci r27, 0xC7 ; 199
6ca: 2c 93 st X, r18
6cc: 01 96 adiw r24, 0x01 ; 1
6ce: 8f 3f cpi r24, 0xFF ; 255
6d0: 91 05 cpc r25, r1
6d2: b9 f7 brne .-18 ; 0
Runtimes were: PGM flash readss were 4008 clocks + overhead do 255 bytes. reads of memory mapped flash were 3320 clocks for same.
Screw it, conssidering the difficuty of the solutions, getting charachters into the serial buffer 20% faster? not worth it .... if it was 2:1? Okay then I'd be interested, but this is a piddling benefit considering what we were rtalking abougt doing!
olnh
With 2.1.0 - F() once again turns the text it wraps into a PSTR cast to a __FlashStringHelper - It made me sad.... but we needed the library compatibility.... Now, how can we get that performance back?
One thing that comes to my mind (since research shows that libraries do exist in the wild that have their own functions that call pgm_read_byte() on the characters of a __FlashStringHelper) would be a hacked version of avr/pgmspace.h that took advantage of the memory mapped flash - could we rely on our version being used? If so - is this the golden ticket? Would have the benefit of also making the life of anyone who was using PROGMEM variables better too...