arduino / ArduinoCore-API

Hardware independent layer of the Arduino cores defining the official API
https://www.arduino.cc/reference/en/
GNU Lesser General Public License v2.1
215 stars 120 forks source link

String move() and String(String &&rval) breaks operation of reserve() #161

Open drmpf opened 2 years ago

drmpf commented 2 years ago

When #if cplusplus >= 201103L || defined(GXX_EXPERIMENTAL_CXX0X__) operator = uses move() to just update the buffer pointer of the destination This ignores any reserve() the user has made to ensure the memory is not unnecessarily fragmented. String(String &&rval) has a similar problem

move() should first check the capacity of the destination and if there is sufficient space copy the source to the destination String(String &&rval) should use move()

A suggested move() is

void String::move(String &rhs) {
    if (this != &rhs) {
           if (capacity > rhs.size) {
               copy(rhs.buffer,rhs.size);
             } else {
        free(buffer);
        buffer = rhs.buffer;
        len = rhs.len;
        capacity = rhs.capacity;
             }
        rhs.buffer = NULL;
        rhs.len = 0;
        rhs.capacity = 0;
    }
}
matthijskooijman commented 2 years ago

Interesting perspective. Normally, I would say that string moves should try to minimize copying for maximum speed, and if a move is available, just updating the pointer instead of copying data is the obvious best way.

But I can see your usecase where you reserve space for strings so you never need to realloc them and prevent fragmentation, which would ineed be broken by string moves.

However, these two things are conflicting. If your suggestion would be applied, then in cases where fragmentation is not a specific consideration (which I think is most of the time), your suggestion can significantly hurt performance, which is IMHO not what we'd want.

One compromise might be to apply the copying only when capacity > size && capcity > rhs.size, so if there is extra room available, which suggests that an explicit reserve() has happened and fragmentation might be a consideration. I'm not sure if that's enough, though.

Regarding your proposed implementation, I think there is an error in there. The rhs.buffer = NULL; and subsequent lines should, I think, be inside the else, since now this leaks memory be unsetting rhs.buffer without freeing it.

drmpf commented 2 years ago

Yes there is memory leak in the proposed code that needs to be fixed, say

void String::move(String &rhs) {
    if (this != &rhs) {
           if (capacity > rhs.size) {
               copy(rhs.buffer,rhs.size);
             } else {
        free(buffer);
        buffer = rhs.buffer;
        len = rhs.len;
        capacity = rhs.capacity;
        rhs.buffer = NULL; // prevents invalidate() trying to free rhs.buffer which is now the destination buffer
             }
           rhs.invalidate();
    }
}

I did wonder if the original code had been 'optimized' for large assignments, as you might get when dealing with webpage Strings on ESP32 etc. On the other hand the ESP32 etc have much higher clock speeds so the copies will be faster. In any case the previous code will impact the careful use of Strings on AVR boards where reserve() makes a difference.

"cases where fragmentation is not a specific consideration (which I think is most of the time)"

I don't disagree with you, but I have continual fights on Arduino forum whenever I propose Strings as the simple way to do things. The pervasive response always is "but you will get fragmentation and there is not enough memory to waste any and you will get program errors" and the forum advice is almost always "you should never be using Strings"

I have looked into using Strings extensively in Taming Arduino Strings, Avoiding Fragementation and Out-of-Memory Issues and the solution for small memory uC's , if you have memory problems, depends on reserve() working.

Your heuristic 'capacity > size && capcity > rhs.size' seems a bit hit and miss to me. Once a String's capacity grows in use it is never reduce even is the size is reduced.

I did think perhaps someone would propose that the move code could be conditional on the processor, so that 'small' memory processors (Uno/Mega etc) would copy, if capacity allowed, while those processors with larger memories (ESP8266/ESP32) would use move().

matthijskooijman commented 2 years ago

Your heuristic 'capacity > size && capcity > rhs.size' seems a bit hit and miss to me. Once a String's capacity grows in use it is never reduce even is the size is reduced.

Yeah, it's probably not good enough.

Would an alternative be to remember whether reserve() has been called and to copy into those strings? Still a bit heuristically, of course...

I did think perhaps someone would propose that the move code could be conditional on the processor, so that 'small' memory processors (Uno/Mega etc) would copy, if capacity allowed, while those processors with larger memories (ESP8266/ESP32) would use move().

I wonder if small vs large memory is really the distinguishing factor here. Rather, I guess you really need to switch between both behaviors based on the expectation of the programmer (i.e. whether they wrote their code to expect copies to prevent fragmentation or not). This could be done using a board option selected at build time (a platform option would be better, if that is implemented). Or, maybe (given that libraries might also have different expectations) you'd need more fine-grained control (e.g. mark specific strings is non-move, maybe implied like a call to reserve() as suggested above).

Another way to look at this, is to ask in which cases do you need the move behavior instead of copying bytes? I guess this is mostly cases where the destination string has no buffer allocated at all yet (i.e. move constructor, not move assignment operator). Another case could be when you completely replace a string and both are local variables, e.g. something like:

String value = "none";
if (some_condition)
    value = String("x=") + String(int_value)

In this case, your suggested move operator would copy bytes, while it would definitely be better to just move the pointer. Doing the byte copy only after reserve() was called would also catch this case correctly.

I have looked into using Strings extensively in Taming Arduino Strings, Avoiding Fragementation and Out-of-Memory Issues and the solution for small memory uC's , if you have memory problems, depends on reserve() working.

Nice article you wrote, excellent and detailed investigation. One smalle remark: You say "due to the build in __malloc_margin of 128 bytes.", but AFAICS (based on docs and source), it seems to be 32 by default, not 128?

Also nice to see your SafeString class. I have been thinking about (but never finding time for anything else) generalizing the official String class (or maybe introducing a new class) that can wrap either an external char* buffer, a PROGMEM string, or dynamically allocated buffers and offer the same API on all of them (maybe split into a read-only base class and read-write subclass), so code using strings can be written generically. But that digresses greatly from this issue, and might deserve an issue on its own...

drmpf commented 2 years ago

an alternative be to remember whether reserve() has been called and to copy into those strings?

I like that idea. Simple for the user. Most of the time Strings "just work" When you need to "fix" String fragmentation, reserve() is the tool provided.

Re __malloc_margin

void setup() {
  Serial.begin(9600);
  Serial.print("extern size_t __malloc_margin:"); Serial.println(__malloc_margin);
}
void loop() {
}

prints extern size_t __malloc_margin:128 on UNO compiled with Arduino 1.8.19 from this patch (I think) https://github.com/arduino/toolchain-avr/blob/master/avr-libc-patches/01-arduino-malloc_margin.patch