espruino / Espruino

The Espruino JavaScript interpreter - Official Repo
http://www.espruino.com/
Other
2.74k stars 741 forks source link

jswrapper.c: wasting ~2 bytes per function decl, could save ~1.5kb #2405

Closed gfwilliams closed 4 months ago

gfwilliams commented 11 months ago

Currently we define each symbol as:

typedef struct {
  unsigned short strOffset;
  unsigned short functionSpec; // JsnArgumentType
  void (*functionPtr)(void);
} PACKED_JSW_SYM JswSymPtr; // 8 bytes

but on embedded platforms the function pointer is max 20 bits, and we're using 32. So we could do:

typedef struct {
  unsigned short functionSpec; // JsnArgumentType
  unsigned int functionPtr : 20; // up to 1MB flash
  unsigned int strOffset : 12; // size of longest JswSymList.sumbolChars string
} PACKED_JSW_SYM JswSymPtr; // 6 bytes

Same for jswSymbolTables and other places we reference pointers

The problem is, when defining functionPtr we cast to an unsigned int and get an error:

gen/jswrapper.c:1310:34: error: initializer element is not computable at load time
   {JSWAT_INT32 | JSWAT_THIS_ARG, (unsigned int)jswrap_graphics_getBPP, 236},

When what we do now with:

   {236, JSWAT_INT32 | JSWAT_THIS_ARG,(void (*)(void))jswrap_graphics_getBPP},

Is totally fine. This appears to be GCC enforcing the spec when it's not 100% required: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38354

So potentially we could swap jswrapper.c to jswrapper.cpp and compile it with arm-none-eabi-g++ instead of arm-none-eabi-gcc and apart from fixing a few other minor things that appear to be different in C++ vs C99 I think we should be good.

Any thoughts? Better C-only solutions? I guess for 1.5kb this might not be worth it, but it annoys be looking at binary dumps and seeing unused zeros everywhere!

gfwilliams commented 5 months ago

Just to add we're now using std=gnu11 and this is still an issue.

Maybe worth checking after https://github.com/espruino/Espruino/issues/2455 to see if that will handle it

Otherwise, it's very hacky, but as mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38354 if we actually compiled everything as:

typedef struct {
  unsigned short functionSpec; // JsnArgumentType
  void (*functionPtr)(void); // top bits contain strOffset;
} PACKED_JSW_SYM JswSymPtr; // 8 bytes

then after everything has linked we could get the address in the binary of the array, and manually patch it up to use the top bits of functionPtr for strOffset

... but I think that's probably a step too far - one of those things that will cause too much trouble in the future!

fanoush commented 5 months ago

then after everything has linked we could get the address in the binary of the array, and manually patch it up to use the top bits of functionPtr for strOffset

the gcc bug links this https://www.polyomino.org.uk/computer/c/const-exprs-gnu.txt

(b) Addresses of functions plus or minus constants are accepted as
address constants.

so this actually compiles and the data is there

#include <stdio.h>

#define PACKED_JSW_SYM

typedef struct {
  unsigned short functionSpec; // JsnArgumentType
union {
  void (*functionPtr)(void);
  unsigned int strOffset; // size of longest JswSymList.sumbolChars string
};
} PACKED_JSW_SYM JswSymPtr; // 6 bytes

void f(){
return ;
}
JswSymPtr p = { 0, f + (1<<30) };

int main(){
printf("%x %x %x\n", p.functionPtr, p.strOffset,&f);
}
fanoush commented 5 months ago

But of course when reading both it should be masked away. Or also this works but not sure it is worth it(?)

#include <stdio.h>

#define PACKED_JSW_SYM

typedef struct {
  unsigned short functionSpec; // JsnArgumentType
  union {
    void (*functionPtr)(void);
    struct {
      unsigned int functionInt : 20; // up to 1MB flash
      unsigned int strOffset : 12; // size of longest JswSymList.sumbolChars string
    };
  };
} PACKED_JSW_SYM JswSymPtr; // 6 bytes

void f(){
return ;
}
JswSymPtr p = { 0, f + (1<<20) };

int main(){
printf("%x %x %x %x\n", p.functionPtr, p.functionInt, p.strOffset,&f);
}

on intel when compiled I get b8d23135 23135 b8d b8c23135 so it looks ok except the function address does not fit into 20 bits on intel.

fanoush commented 5 months ago

actually it is not so bad and when renamed it is maybe what you wanted, the real pointer type would be unused except initialization

  union {
    void (*ptrWithOffset)(void);
    struct {
      unsigned int functionPtr : 20; // up to 1MB flash
      unsigned int strOffset : 12; // size of longest JswSymList.sumbolChars string
    };
  };
gfwilliams commented 5 months ago

so this actually compiles and the data is there

Oh wow. That's amazing - what a spot! Thank you!

I actually started fiddling before I saw your other posts, but this appears to build and run on nRF52:

diff --git a/scripts/build_jswrapper.py b/scripts/build_jswrapper.py
index 2698fe727..9c4d7877c 100755
--- a/scripts/build_jswrapper.py
+++ b/scripts/build_jswrapper.py
@@ -212,11 +212,14 @@ def codeOutSymbolTable(builtin):
     if builtin["name"]=="global" and symName in libraries:
       continue # don't include libraries on global namespace
     if "generate" in sym:
-      listSymbols.append("{"+", ".join([str(strLen), getArgumentSpecifier(sym), "(void (*)(void))"+sym["generate"]])+"}")
+      listSymbols.append("{"+", ".join([getArgumentSpecifier(sym), "{(void (*)(void))((size_t)"+sym["generate"]+" + ("+str(strLen)+"<<JSWSYMPTR_SHIFT))}" ])+"}")
       listChars = listChars + symName + "\\0";
       strLen = strLen + len(symName) + 1
     else:
       print (codeName + "." + symName+" not included in Symbol Table because no 'generate'")
+  if strLen > 4096*1024:
+    print("ERROR: strOffset is too long for packing into the function pointer")
+    exit(1)
   builtin["symbolTableChars"] = "\""+listChars+"\"";
   builtin["symbolTableCount"] = str(len(listSymbols));
   codeOut("static const JswSymPtr jswSymbols_"+codeName+"[] FLASH_SECT = {\n  "+",\n  ".join(listSymbols)+"\n};");
@@ -412,12 +415,12 @@ JsVar *jswBinarySearch(const JswSymList *symbolsPtr, JsVar *parent, const char *
   while (searchMin <= searchMax) {
     int idx = (searchMin+searchMax) >> 1;
     const JswSymPtr *sym = &symbolsPtr->symbols[idx];
-    int cmp = FLASH_STRCMP(name, &symbolsPtr->symbolChars[sym->strOffset]);
+    int cmp = FLASH_STRCMP(name, &symbolsPtr->symbolChars[sym->strOffset >> JSWSYMPTR_SHIFT]);
     if (cmp==0) {
       unsigned short functionSpec = sym->functionSpec;
       if ((functionSpec & JSWAT_EXECUTE_IMMEDIATELY_MASK) == JSWAT_EXECUTE_IMMEDIATELY)
-        return jsnCallFunction(sym->functionPtr, functionSpec, parent, 0, 0);
-      return jsvNewNativeFunction(sym->functionPtr, functionSpec);
+        return jsnCallFunction((void (*)(void))((size_t)sym->functionPtr & JSWSYMPTR_MASK), functionSpec, parent, 0, 0);
+      return jsvNewNativeFunction((void (*)(void))((size_t)sym->functionPtr & JSWSYMPTR_MASK), functionSpec);
     } else {
       if (cmp<0) {
         // searchMin is the same
diff --git a/src/jswrap_object.c b/src/jswrap_object.c
index 582da0897..511d2af7c 100644
--- a/src/jswrap_object.c
+++ b/src/jswrap_object.c
@@ -183,7 +183,7 @@ static void _jswrap_object_keys_or_property_names_iterator(
   unsigned int i;
   unsigned char symbolCount = READ_FLASH_UINT8(&symbols->symbolCount);
   for (i=0;i<symbolCount;i++) {
-    unsigned short strOffset = READ_FLASH_UINT16(&symbols->symbols[i].strOffset);
+    unsigned short strOffset = symbols->symbols[i].strOffset >> JSWSYMPTR_SHIFT;
 #ifndef USE_FLASH_MEMORY
     JsVar *name = jsvNewFromString(&symbols->symbolChars[strOffset]);
 #else
diff --git a/src/jswrapper.h b/src/jswrapper.h
index c60a55069..c7ca4ba04 100644
--- a/src/jswrapper.h
+++ b/src/jswrapper.h
@@ -75,11 +75,16 @@ typedef enum {

 /// Structure for each symbol in the list of built-in symbols
 typedef struct {
-  unsigned short strOffset;
   unsigned short functionSpec; // JsnArgumentType
-  void (*functionPtr)(void);
+  union {
+    void (*functionPtr)(void); // top 12 bits are used for strOffset - must mask with JSWSYMPTR_MASK
+    uint32_t strOffset; // stored in top 12 bits, shift right with JSWSYMPTR_SHIFT
+  };
 } PACKED_JSW_SYM JswSymPtr;
 // This should be a multiple of 2 in length or jswBinarySearch will need READ_FLASH_UINT16
+#define JSWSYMPTR_MASK 0xFFFFF // bottom 20 bits for function pointer
+#define JSWSYMPTR_SHIFT 20 // top 12 bits used for String offset
+

 /// Information for each list of built-in symbols
 typedef struct {

I guess maybe we would need build_jswrapper to revert to the old style for the linux builds though...

fanoush commented 5 months ago

Nice :-) Maybe bitfields just hide too much so the simple shifting/masking is more clear. OTOH with bitfields the compiler will do it for you so you cannot forget the masking/shifting somewhere, your choice really.

gfwilliams commented 5 months ago

Actually in functionSpec there's quite a lot of duplication too. I just checked and we have only 123 different functionSpec for Bangle.js 2 so we could store them in an array that we then reference with a single byte (not Uint16) - that would save maybe another 400b (but maybe that's for later!)

Yes, the bitfields are a little nicer - and we do use them in JsVar so it's something we do already... as you say that's probably best.

I'm away next week so I'd rather not put this in just before I leave in case it breaks something I didn't spot, but this'll be a great addition for later. Thanks for figuring this one out - it's nice to remove a few zeros from the binary!

fanoush commented 4 months ago

I'm away next week so I'd rather not put this in just before I leave in case it breaks something

maybe you just forgot about this? no hurry, just that it is seem like easy one?

MaBecker commented 4 months ago

Nice finding

gfwilliams commented 4 months ago

Yes, sorry - I did forget :) I'll try and look at it this week - I'll still have to make sure we can disable it for linux builds though.

I think it likely won't work on ESP8266/ESP32 as-is either as it looks like they may use the high bits of the address

gfwilliams commented 4 months ago

Sorted! Managed to do it with some defines which I think is tidier. Right now you'll have to enable it per-device (it's on Bangle.js, Micro:bit 1 and Original boards where flash is at a premium) because I figure it'll be marginally faster without it if we have the memory available