cesanta / elk

A low footprint JavaScript engine for embedded systems
Other
1.65k stars 81 forks source link

Garbage collection doesn't fixup object references to upper and deletes used entities from memory. #52

Closed maxint-rd closed 2 years ago

maxint-rd commented 2 years ago

Hello again. While further testing Elk on my ESP project and after incorporation of my previous fix, I encountered another issue with the garbage collection. It only occurs in certain circumstances and produces unpredictable behavior.

In my project it was very reproducible and failed every time I ran this minimized test script: let fn=function(x) { let i=0; while(i<x) {res=res+"!"; i++; }; }; let res="123"; fn(3); In my ESP-sketch this script results in a WDT-reset. Different variants of this script often wouldn't show the issue the same way. The proper functioning should be that global variable res is set to the value of "123!!!".

After lengthy debugging by inserting debug dumps and debug printfs in js_stmt() and in js_delete_marked_entities() I think I found the issue. Long story short it appears that js_fixup_offsets() doesn't fixup the offset to upper for objects. Having an incorrect reference in upper can result in incorrect deletions in subsequent garbage collections, giving further issues, During my debug sessions I saw the upper reference remained the same after in-between T_STR values for res were deleted by CG. It think this issue floated to the surface now because of the more frequent garbage collection.

After first trying to understand your code, by analyzing debug output and based on that adding comments, I managed to make the above script function work without generating a WDT reset. This is the modified version of js_fixup_offsets():

static void js_fixup_offsets(struct js *js, jsoff_t start, jsoff_t size) {
  // fixup all entities prior to moving data at start for deleting the start entity (which is marked for deletion) 
  for (jsoff_t n, v, off = 0; off < js->brk; off += n) {  // start from 0!
    v = loadoff(js, off);       // load 4-byte offset value of item at off. 
                                         // For objects v is only the offset of its first property. For properties it is the offset of the next item (0 if first)
    n = esize(v & ~GCMASK);
    if (v & GCMASK) continue;  // To be deleted, don't bother
    if ((v & 3) != T_OBJ && (v & 3) != T_PROP) continue;      // T_STR doesn't need further fixup

    // Fixup first/next offset for values beyond starting point 
    if (v > start) saveoff(js, off, v - size);      // save new offset (v-size) for first/next at off  

    // MMOLE 220216: proper fixup of upper offset for objects that don't fall under global scope (upper@0)
    if ((v & 3) == T_OBJ) {
      jsoff_t  u = loadoff(js, off+sizeof(jsoff_t));       // load 4-byte offset value of upper
      if(u > start)
        saveoff(js, off+sizeof(jsoff_t), u - size); // fixup upper, checked for negatives
    }

    // fixup property, key and value
    if ((v & 3) == T_PROP) {

      // fixup key offset
      jsoff_t koff = loadoff(js, off + sizeof(off));
      if (koff > start) saveoff(js, off + sizeof(off), koff - size);

      // fixup value data offset
      jsval_t val = loadval(js, off + sizeof(off) + sizeof(off));
      if (is_mem_entity(vtype(val)) && vdata(val) > start) {
        // printf("MV %u %lu -> %lu\n", off, vdata(val), vdata(val) - size);
        saveval(js, off + sizeof(off) + sizeof(off), mkval(vtype(val), vdata(val) - size));
      }
    }
  }

  // fixup callbacks
  for (jsoff_t i = 0; i < js->ncbs; i++) {
    jsoff_t base = js->size + i * 3 * sizeof(jsoff_t) + sizeof(jsoff_t);
    jsoff_t o1 = loadoff(js, base), o2 = loadoff(js, base + sizeof(o1));
    if (o1 > start) saveoff(js, base, o1 - size);
    if (o2 > start) saveoff(js, base + sizeof(jsoff_t), o2 - size);
  }
  // Fixup js->scope
  jsoff_t off = vdata(js->scope);
  if (off > start) js->scope = mkval(T_OBJ, off - size);
}

Disclaimer: Please note that I only tried to fix this issue as I encountered it. Using my code is at your own risk. No guaranties given and no liability accepted. Unfortunately I am not able to provide support....

maxint-rd commented 2 years ago

As js_gc is called at every statement, this may affect currently executed functions. When executing function code we can't simply delete items that are in memory before the currently executed code (eg. global variables), as that can mess up our token-position, resulting in the next token to be different.

This is the test script that failed: let res2="aaa"; let fn=function(x) { let i=0; while(i<x) {res2=res2+"!"; i++;}; }; fn(10); Note that this time, the variable that gets a longer value is located before the function definition. At the first iteration the STR value is reallocated, making the old allocation candidate for GC. If the current js->pos points to an affected function, it is no longer valid after the memory was moved.

To resolve this various resolutions are possible. We could take other measures to prevent moving the currently executed code, such as:

These are alternative solutions I've thought of, some even tested, but will not work:

Do you have suggestions on how this issue should be fixed best?

maxint-rd commented 2 years ago

After some considerations I appended the js_delete_marked_entities() function to move the code pointer along with the moved memory in specific conditions. Perhaps these conditions need to be made a bit more general to cover more impacted cases, but at the moment I only encountered the issue when executing function code. After making this change, the test script mentioned before works properly: let res2="aaa"; let fn=function(x) { let i=0; while(i<x) {res2=res2+"!"; i++;}; }; fn(10);

Changed code (with debug statements commented out and added code comment):

static void js_delete_marked_entities(struct js *js) {
  for (jsoff_t n, v, off = 0; off < js->brk; off += n) {
    v = loadoff(js, off);
    n = esize(v & ~GCMASK);
    if (v & GCMASK) {  // This entity is marked for deletion, remove it
//printf("DEL: %4u %d %x\n", off, v & 3, n);
//printf("DEL: %4u %s%d %d\n", off, typestr(v&3), v & 3, n);
      // assert(off + n <= js->brk);
      js_fixup_offsets(js, off, n);
      memmove(&js->mem[off], &js->mem[off + n], js->brk - off - n);   // move all data beyond deleted entity to location of entity
      js->brk -= n;  // Shrink brk boundary by the size of deleted entity

      // MMOLE 220217: JS Functions are executed from code that is in the js->mem buffer.
      // When executing function code, deleting items may move the currently executed function code.
      // This can mess up our code-position, resulting in the next token to be read incorrectly.
      // Some statements store js->pos on the stack, such as the returning point of a while loop.
      // For that reason we don't adjust js->pos, but since code was moved, we change the code pointer accordingly.
      // js->pos is relative to the start of js->code, so we dont need to adjust js->pos.
      if(off<js->scope &&  ((js->flags & F_CALL) == F_CALL) && js->code>js->mem && js->code<js->mem+js->size)
      { // We're in a function call, code is within js->mem and the deleted item was before the current scope.
        // Final test is to see if the deleted item was located before our code
        if(&js->mem[off]<js->code)
        {
//printf("DEL@FUN: %4u %s%d %d\n", off, typestr(v&3), v & 3, n);
//printf("DEL@FUN pre: [s%u] '%.*s'@%u, post GC brk: %u\n", (jsoff_t) vdata(js->scope), js->clen-js->pos, js->code+js->pos, js->pos, js->brk);
          js->code-=n;
//printf("DEL@FUN post: [s%u] '%.*s'@%u, post GC brk: %u\n", (jsoff_t) vdata(js->scope), js->clen-js->pos, js->code+js->pos, js->pos, js->brk);
        }
      }
      n = 0;         // We shifted data, next iteration stay on this offset
    }
  }
}
cpq commented 2 years ago

Could you make a PR with your change please?

maxint-rd commented 2 years ago

Hello, unfortunately I don't have a version available that can be submitted as PR. In the past months I've been experimenting with ELK for the purpose of my ESP8266 project and my current version no longer adheres to your design principles of being small and portable (and naked code isn't really my thing, so it has a lot of comments and old code left for cleanup). Just to give you an idea: amongst others I've been working on alternative memory usage by means of supporting PROGMEM for imported functions and keys. It gave me plenty resets along the way... (Debugging on ESP8266 isn't ideal. GDB crashed too often and every printf requires a new compilation).

However as my version becomes more useful for my personal project, I feel I should give back to the community and support your work. Once I can find some free time I will try to make another version based on your current release, incorporate this suggested fix (and perhaps some other associated fixes) and submit that version as a PR along with some unit-tests. Unfortunately I'm quite busy at the moment, so it might take a while before I do so.

cpq commented 2 years ago

@maxint-rd sounds good, thank you. Could you elaborate a little on your project please?

maxint-rd commented 2 years ago

Sure... I've been looking for a light weight scripting engine to use in my collection of ESP8266 boards. I've made a few small (hobby) projects such as a clock and a robot cart that use the ESP. You can find a video of them on my Youtube channel.

I'm looking for a common setup to share among various projects, in which the ESP8266 features a FileManager and website, an MQTT connection and some project specific control and behavior. For ease of programming I thought Javascript could be useful, sharing some compatibility between the embedded website and the JS driven hardware. While keeping some features hardcoded in firmware, some others can then be changed on the whim by JS, possibly even distributing changed code via MQTT.

I have looked at some alternatives such as MicroPython, Basic and WebAssembly but for now prefer Javascript if I can get it working according my wishes. Of course using a larger MCU (such as ESP32) would have been easier, but that will not put my current MCU collection to work.

While testing ELK I encountered the issues mentioned earlier and some more, as I tried to get more features working. While available memory is larger on the ESP than on most Atmels, stable networking requires a lot of free RAM. This limits the size of the JS memory area and also the size of available stack. Using PROGMEM allows for more features in less memory. (Even though any reasonable script still easily fills it).

Hope this gives you a bit more context to my experiments. As you might have read, it's more about the challenge of limitations than about practicality and real world use.

pavvmagic commented 2 years ago

With your fixes to the js_fixup_offsets and js_delete_marked_entities functions, my stress test worked: `

assert(ev(jsp,
  "let tmr1cb;os.tmr_init(1,function(){tmr1cb();},null);"
  "let tmr2cb;os.tmr_init(2,function(){tmr2cb();},null);"
  "let tmr3cb;os.tmr_init(3,function(){tmr3cb();},null);"
  "let tmr4cb;os.tmr_init(4,function(){tmr4cb();},null);"
  "let setTimeout1=function(fn,ms){tmr1cb=fn;os.tmr_start(1,ms,false);};"
  "let setTimeout2=function(fn,ms){tmr2cb=fn;os.tmr_start(2,ms,false);};"
  "let setTimeout3=function(fn,ms){tmr3cb=fn;os.tmr_start(3,ms,false);};"
  "let setTimeout4=function(fn,ms){tmr4cb=fn;os.tmr_start(4,ms,false);};"
  "let setInterval1=function(fn,ms){tmr1cb=fn;os.tmr_start(1,ms,true);};"
  "let setInterval2=function(fn,ms){tmr2cb=fn;os.tmr_start(2,ms,true);};"
  "let setInterval3=function(fn,ms){tmr3cb=fn;os.tmr_start(3,ms,true);};"
  "let setInterval4=function(fn,ms){tmr4cb=fn;os.tmr_start(4,ms,true);};"
  " setTimeout1(function(){ "
  "  os.dbgs('#1'); "
  "  setInterval1(function(){ "
  "    os.dbgs('&1'); "
  "  },100) "
  " },1000); "
  " setTimeout2(function(){ "
  "  os.dbgs('#2'); "
  "  setInterval2(function(){ "
  "    os.dbgs('&2'); "
  "  },150) "
  " },2000); "
  " setTimeout3(function(){ "
  "  os.dbgs('#3'); "
  "  setInterval3(function(){ "
  "    os.dbgs('&3'); "
  "  },125) "
  " },3000); "
  " setTimeout4(function(){ "
  "  os.dbgs('#4'); "
  "  setInterval4(function(){ "
  "    os.dbgs('&4'); "
  "  },115) "
  " },4000); "
  " 0", "0"));`
cpq commented 2 years ago

@maxint-rd your change is integrated, thank you!