espruino / Espruino

The Espruino JavaScript interpreter - Official Repo
http://www.espruino.com/
Other
2.76k stars 741 forks source link

Constant strings (and other data) moved to flash memory #697

Closed nkolban closed 2 years ago

nkolban commented 8 years ago

With the port of Espruino to the ESP8266, an interesting characteristic has been uncovered. It appears that on some architectures, constant strings (which are strings that are defined as constants ... usually between double quotes) are placed in precious data RAM storage as opposed to a potential alternative which is that they be held in flash memory. There have been a number of discussions in this topic area in general. This issue is being raised to examine what opportunities might exist within the Espruino code base to take advantage of this.

Here are some links to potentially related materials:

From a notional perspective, it would be ideal if the Espruino code could be compiled or linked in some fashion that would "just" place the constant data in flash. However, that may not be possible. If it weren't possible, the next question would be "If a preparation could be performed on String constants in the code base, would this issue warrant the rework of the code base to achieve that?".

For example, if instead of coding:

if (strcmp("Hello", myData) == 0)

we might have to code:

if (strcmp(F("Hello"), myData) == 0)

would such an undertaking to work through as much of the source code as we could be worth it? Obviously a default macro would result in zero change to the data ... so the cost would be in the dimensions of "time" and "benefit". It is suspected that one will trade app execution speed for reduction in RAM usage.

gfwilliams commented 8 years ago

There were some good ideas here: https://github.com/SuperHouse/esp-open-rtos/issues/11

I can across a semi-similar thing when looking at porting to Arduino, and in the end I gave up. IMO it's just too much effort to wrap every function call this way (not to mention it's a massive source of potential bugs). It's kind of frustrating the toolchain can't deal with it in a better way.

Since Espruino has its own implementation of the string functions (on properly embedded systems quite a few of the built-in functions pull in stuff that really bloats the binary) perhaps you could just reimplement those functions so that they do aligned 32 bit reads, then all strings could go into Flash and you'd be fine?

gfwilliams commented 8 years ago

... just to add that as far as I can gather, since the ESP8266 isn't harvard architecture you don't have to worry about code meant for strings being used for something else because it will 'just work'. It actually makes things a lot easier.

As that post you found says, you could just make every read an aligned 32 bit read and everything would be fine, if slow and a bit bloated.

For now, there's some 'low hanging fruit':

Those could be put in flash pretty easily and aligned reads could be performed on them, since the only stuff that reads them is in those files.

nkolban commented 8 years ago

Are we sure that the ESP8266 isn't a Harvard Architecture? I believe that the processor core in an ESP8266 is an Xtensa-106 ... and this business wire about its announcement says:


Using a traditional Harvard architecture, it features separate local, tightly coupled, instruction and data RAMs to eliminate memory contention and provide fast performance on performance-critical code and interrupt handling routines. RAM size is user selectable up to 128K bytes.


gfwilliams commented 8 years ago

Interesting... I only said that because that's what @tve had said in Gitter IIRC?

Some processors (like ARM) effectively have separate memories (Flash + RAM) on (I think) different busses, but have a kind of 'bus matrix' that allows them both to be accessed in the same address space. I guess the Xtensa may do something similar.

It's actually pretty impressive that Espruino works as well as it does since it accesses flash memory so much for the symbol tables - I guess the extra RAM really helps.

gfwilliams commented 8 years ago

@tve already did a whole bunch here: https://github.com/espruino/Espruino/commit/294f3ffc6f2d18aae31934ae1cb96df2ef2d4a32

IMO jswrapper.c (the majority of which isn't actually strings but is still const data) could maybe be moved over too.

tve commented 8 years ago

At this point I am labeling this as an optimization because the amount of free RAM is quite fine and not a top-priority. The open work items I believe are:

gfwilliams commented 8 years ago

I'd been vaguely wondering about moving all the error messages/warnings to a separate file: https://github.com/espruino/Espruino/issues/50

It'd help with localisation, might help to reduce duplication (the compiler may merge the same things, but some differ only by a character or two), and as these things don't have to be accessed quickly, we could potentially store them compressed, decompress the whole thing on demand and just pull out the relevant bits.

Not an issue on ESP8266, but if you've got 128kB of flash and 20kB of it is strings, it could help a lot.

wilberforce commented 8 years ago

@tve I'm looking to free up more heap space and want to ensure I'm on the right track here:

https://github.com/espruino/Espruino/compare/master...wilberforce:master

targets/esp8266/esp8266_board_utils.c

These strings are using in macros that use os_printf

Am I correct in assuming that anything that uses os_printf is ok to from FLASH_STR, so it does not need to be copied to ram first?

If this is the case, I would like to work with the following list and see if I can increase the available heap space.

The change above yields 144 bytes, it does not seem like much but I can do a few modules it will add up...

Next target was the string in ota.c.

Here is the output from topstrings:

765 ./src/jsinteractive.o 542 ./libs/network/socketserver.o 519 ./gen/jswrapper.o 438 ./libs/network/esp8266/jswrap_esp8266_network.o 387 ./libs/network/esp8266/ota.o 295 ./src/jslex.o 245 ./src/jsvar.o 225 ./src/jsparse.o 216 ./libs/network/telnet/jswrap_telnet.o 214 ./src/jswrap_process.o 207 ./targets/esp8266/esp8266_board_utils.o 207 ./src/jswrap_json.o 196 ./src/jswrap_io.o 196 ./libs/network/socketerrors.o 176 ./libs/network/esp8266/network_esp8266.o

(in the makefile I had to change .rodata.str1.4 to .rodata.str1.1) - should I push this as an update - I'm not sure of the cause of this

tve commented 8 years ago

This ought to crash 'cause os_printf doesn't copy every arg to ram, only the first one.

gfwilliams commented 8 years ago

Make sure you compile with RELEASE=1, or you'll be wasting effort moving stuff that won't even be in releases.

I'd really recommend using the exception handling code that got linked somewhere else recently. It'd make life a million times easier, especially for the os_printf case where speed is not an issue.

It's also avoid completely screwing up the Espruino codebase for the sake of ESP8266 - I won't accept PRs that make code unreadable for the sake of one platform :(

By the way: I'm still not convinced about topstrings' accuracy - jswrapper.c contrains LOADS of const data that is presumably put into RAM - I'm pretty sure it's many kilobytes rather than just 542B. I'm not really sure why it's not getting reported.

tve commented 8 years ago

Thanks for pointing out jswrapper.o. Taking a look at it, some strings end up in .rodata.str1.1 and some in .rodata. I believe this has to do with string constant merging. So far I don't understand why strings end up in one or the other section, for example const char[] and const char* seem to behave differently...

Overall I see four options for reducing the size:

I have to admit I like a combination of 1 and 3 the best, although 1 and 4 are also appealing. I have a hard time believing that 2 can work reasonably (hey, but someone can prove me wrong :-).

gfwilliams commented 8 years ago

I think #1 is a really good start (maybe with #2 as a fallback for non-critical places) - Object.getOwnProperties uses it, but that could be handled via the exception handler without a huge hit. Given jswrapper is auto-generated anyway, doing it in a non-messy (or not any more messy than it is already!) way should be pretty easy.

3 is more difficult - E.setBootCode will cause everything to be run from flash already (and the Web IDE will use it with an option), but it doesn't help you much. The worst offenders (unless you have huge code) are the variable names themselves, which would be very difficult to move into flash (at that stage, #4 looks more appealing to me at least)

tve commented 8 years ago

Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:

@@ -313,10 +313,13 @@ for b in builtins:
 codeOut('');
 codeOut('');

+for b in builtins:
+  builtin = builtins[b]
+  codeOut("FLASH_STR(jswSymbols_"+builtin["name"]+"_str, " + builtin["symbolTableChars"] +");");
 codeOut('const JswSymList jswSymbolTables[] = {');
 for b in builtins:
   builtin = builtins[b]
-  codeOut("  {"+", ".join(["jswSymbols_"+builtin["name"], builtin["symbolTableCount"], builtin["sym
+  codeOut("  {"+", ".join(["jswSymbols_"+builtin["name"], builtin["symbolTableCount"], "jswSymbols_
 codeOut('};');

 codeOut('');

This brings rodata from 17312 to 14492 bytes, so that's how much room is left for improvement :-)

tve commented 8 years ago

Ah, I hadn't seen e.setBootCode, I'll have to give that a try. The 2800 bytes potentially saved by jswSymbols are also very attractive...

tve commented 8 years ago

Mhh, another approach for jswrapper.c could be to change JswSymPtr and JswSymList to only have word-size & aligned fields and then move all the static data in jswrapper.c into flash. That could easily yield 5KB of total savings, if not more... It might also reduce the changeset.

gfwilliams commented 8 years ago

Ah, I hadn't seen e.setBootCode

It's pretty recent (this week I think?). There is some room for re-using substrings in functions but that's a bit dodgy if you can't ensure that the memory area will stay around...

That could easily yield 5KB of total savings, if not more...

Sounds good :) Most of the changes should be inside jswrapper.c... Since most stuff that accesses the tables is in jswrapper, you could almost skip the alignment too.

wilberforce commented 8 years ago

Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:

Won't the code that accesses the strings also need to be modified?

These could also go into flash:

static const JswSymPtr jswSymbols_Object[] = {
  {0, (void (*)(void))jswrap_object_create, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1))},
  {7, (void (*)(void))jswrap_object_defineProperties, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1)) | (JSWAT_JSVAR << (JSWAT_BITS*2))},
  {24, (void (*)(void))jswrap_object_defineProperty, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1)) | (JSWAT_JSVAR << (JSWAT_BITS*2)) | (JSWAT_JSVAR << (JSWAT_BITS*3))},
  {39, (void (*)(void))jswrap_object_getOwnPropertyDescriptor, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1)) | (JSWAT_JSVAR << (JSWAT_BITS*2))},
  {64, (void (*)(void))gen_jswrap_Object_getOwnPropertyNames, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1))},
  {84, (void (*)(void))gen_jswrap_Object_keys, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1))}
};

Are the alignments of the JswSymPtr structure all be on 4 byte boundaries and are the elements 4 bytes long?

wilberforce commented 8 years ago

This kind of function call is used although the code base:

  jsvObjectSetChildAndUnLock(jsDetails, "authMode", jsvNewFromString(wifiAuth[config.authmode]));
  jsvObjectSetChildAndUnLock(jsDetails, "hidden", jsvNewFromBool(config.ssid_hidden));
  jsvObjectSetChildAndUnLock(jsDetails, "maxConn", jsvNewFromInteger(config.max_connection));

I've made a wrapper for jsvObjectSetChildAndUnLock that declares the string literal in flash memory, and then copies to a static buffer before the call to the real jsvObjectSetChildAndUnLock.

It's also avoid completely screwing up the Espruino codebase for the sake of ESP8266 - I won't accept PRs that make code unreadable for the sake of one platform :(

@gfwilliams it doesn't corrupt the code base too much... you might be able to suggest something that will clean this up further:

#define READ_FLASH_UINT8(ptr) ({ uint32_t __p = (uint32_t)(char*)(ptr); volatile uint32_t __d = *(uint32_t*)(__p & (uint32_t)~3); ((uint8_t*)&__d)[__p & 3]; })

#define __concat(o,l) __##o##l
#define jsvObjectSetChildAndUnLock(obj,name,child) jsvObjectSetChildAndUnLock_FLASH_CTR(obj,__COUNTER__,name,child)
#define jsvObjectSetChildAndUnLock_FLASH_CTR(obj,lbl,name,child) FLASH_STR(__concat(obj,lbl),name);\
jsvObjectSetChildAndUnLock_flash(obj,__concat(obj,lbl), child )

#define NOT_FLASH_LITERAL /* no flash literal */

#else

/** Read a uint8_t from this pointer, which could be in RAM or Flash.
    On ARM this is just a standard read, it's different on ESP8266 */
#define READ_FLASH_UINT8(ptr) (*(uint8_t*)(ptr))

#define NOT_FLASH_LITERAL

#endif

This is using the __COUNTER___ macro, so for the code example: jsvObjectSetChildAndUnLock(jsDetails, "authMode", jsvNewFromString(wifiAuth[config.authmode]));

becomes:

FLASH_STR( __jsDetails0,"authMode" )
jsvObjectSetChildAndUnLock_flash(jsDetails, __jsDetails0, jsvNewFromString(wifiAuth[config.authmode]));

and the next line would __jsDetails1 etc...

This take the headache of having to declare all the strings before hand.

Where this doesn't work is when it's part of an if statement, or the 2nd arg is not a string literal - so I had to introduce a macro to stop substitution from occurring:

jsvObjectSetChildAndUnLock NOT_FLASH_LITERAL(funcs, buf, func);

if (arrayBuffer2) jsvObjectSetChildAndUnLock NOT_FLASH_LITERAL(waveform, "buffer2", arrayBuffer2);

This is the part that you might not like Gordon!

Anyway, I've applied through the code base: https://github.com/espruino/Espruino/compare/master...wilberforce:master

The saving is (without graphics with crypto):

Before: "freeHeap": 4576,

After: "freeHeap": 5184

So 608 bytes of heap freed up.

wilberforce commented 8 years ago

Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:

@tve this looks very promising....

What changes need to be made in jswrap_object.c to access the strings in flash?

Sorry - I don't know the code well enough to know where to start looking ;-(

wilberforce commented 8 years ago

@tve

Mhh, another approach for jswrapper.c could be to change JswSymPtr and JswSymList to only have word-size & aligned fields and then move all the static data in jswrapper.c into flash. That could easily yield 5KB of total savings, if not more... It might also reduce the changeset.

https://github.com/espruino/Espruino/blob/2d30d810baba1a00819a69e04df311f9e89b082f/src/jswrapper.h#L66

/// Structure for each symbol in the list of built-in symbols
typedef struct {
  unsigned short strOffset;
  void (*functionPtr)(void);
  unsigned short functionSpec; // JsnArgumentType
} PACKED_FLAGS JswSymPtr;

/// Information for each list of built-in symbols
typedef struct {
  const JswSymPtr *symbols;
  unsigned char symbolCount;
  const char *symbolChars;
} PACKED_FLAGS JswSymList;

https://github.com/espruino/Espruino/blob/db470b6bff59df1c4af17302713a9c873be421a3/src/jsutils.h#L248

#define PACKED_FLAGS  __attribute__ ((__packed__))
#if defined(ESP8266)
// Don't pack
#define PACKED_FLAGS
#else
/// Used when we have enums we want to squash down
#define PACKED_FLAGS  __attribute__ ((__packed__))
#endif

Will this be sufficient? Will the unsigned short strOffset;by padded to 4 bytes if the declaration uses attribute ((__aligned(4)))?

Will the unsigned short strOffset;by padded to 4 bytes?

I tried above (Without the aligned(4), however the elf is now too big:

LD espruino_esp8266_partial.o
LD espruino_esp8266_user2.elf
LD espruino_esp8266_user1.elf
/home/esp8266/esp-open-sdk/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/4.8.2/../../../../xtensa-lx106-elf/bin/ld: address 0x3fffd0a8 of espruino_esp8266_user1.elf section `.bss' is not within region `dram0_0_seg'
/home/esp8266/esp-open-sdk/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/4.8.2/../../../../xtensa-lx106-elf/bin/ld: address 0x3fffd0a8 of espruino_esp8266_user1.elf section `.bss' is not within region `dram0_0_seg'
wilberforce commented 8 years ago

taking out the PACKED_FLAGS, the structures above, and changing in jswrapper:

static const JswSymPtr jswSymbols_Wifi[] = { to static JswSymPtr jswSymbols_Wifi[] __attribute__((section(".irom"))) __attribute__((aligned(4))) = {

>require("Wifi").scan(function(x){console.log(x)});
Uncaught Error: Function "scan" not found!
 at line 1 col 17
require("Wifi").scan(function(x){console.log(x)});

So still no closer ;-(

gfwilliams commented 8 years ago

https://github.com/espruino/Espruino/compare/master...wilberforce:master

Yeah, it's not ideal - and honestly I doubt most people adding code in the future will fully appreciate why they need NOT_FLASH_LITERAL.

In fact, I don't understand why it's in places like this? https://github.com/espruino/Espruino/compare/master...wilberforce:master#diff-1e3672c0223d1668a1afdbb00fffe1f2R137

I really think it'd be worth trying to add the exception handler, and then changing the linker to just dump all strings into flash. My guess is performance wouldn't be hit anywhere near as badly as you think, and the few places where it was hit could be fixed, rather than having to change all code (and even then, we're still missing loads of strings that aren't used via jsvObjectSetChildAndUnLock).

Will #define PACKED_FLAGS this be sufficient?

That'd be a really bad idea :) PACKED_FLAGS is used all over the place and would break LOADS of stuff if it was changed. You'd have to tweak just jswrapper.c (as you did)

I'm not sure why changing it like you did above wouldn't work though - but the fact it doesn't reset is a good sign :) My guess is that unsigned char symbolCount/etc will still force a non-word read, even if it is aligned. try changing that (and others) to uint32_t.

wilberforce commented 8 years ago

It is places like this:

 if (addrStart>0)
-    jsvObjectSetChildAndUnLock(obj, "protocol", jsvNewFromStringVar(url, 0, (size_t)addrStart-1));
+    jsvObjectSetChildAndUnLock NOT_FLASH_LITERAL(obj, "protocol", jsvNewFromStringVar(url, 0, (size_t)addrStart-1));

Because the first part of the macro expands to FLASH_STR

Which declares the literal, and the second part is the function call.

The declaration does not work because the if statement is preceding. If the statement was rewritten to use a block, I guess it would work:

 if (addrStart>0) {
    jsvObjectSetChildAndUnLock(obj, "protocol", jsvNewFromStringVar(url, 0, (size_t)addrStart-1));
}

.. So if I put the { } in the macro expansion, then it would be transparent.

gfwilliams commented 8 years ago

Oh wow, that's super nasty :) You can always do stuff like do { ... } while (0) to avoid that I guess.

But to me this just seems like madness - the code's being made a lot less readable just to save 600 bytes - and over time, jsvObjectGetChild gets added, jsvObjectSetChild does, and more and more, until it's just this insane mess.

wilberforce commented 8 years ago

Here is the reference to the exception handler https://github.com/espruino/Espruino/issues/807#issuecomment-202147139

I'm not sure how to add this into the code - is there a vector in the link script to add this function too?

From what I have read it can have performance hit of approx 6x.

wilberforce commented 8 years ago
  • the code's being made a lot less readable just to save 600 bytes

The trouble was that until I tried this, I didn't know what the saving would be!

I think the jswrapper strings and other constants are an easier target!

gfwilliams commented 8 years ago

I think you literally just add the code, and make sure flash_emul_init gets added.

However: while we could give this a try, it's GPL licensed software and it can't be included in Espruino unless we get their permission to change the license (or we wrote our own). There do seem to be other implementations though, if you google for EXCCAUSE_LOAD_STORE_ERROR

6x performance hit is probably fine, because hopefully it won't get hit that often. If jswBinarySearch and maybe strcmp were changed, that's probably the majority of uses - the rest of the hits won't be that often.

As you say, if we can get a 5kB saving just by changing jswrapper, that'd solve a lot.

However if we could fix this with the exception handler, we could actually take out a lot of the ESP8266-specific hacks that's really tidy things up :)

wilberforce commented 8 years ago

3 is more difficult - E.setBootCode will cause everything to be run from flash already (and the Web IDE will use it with an option), but it doesn't help you much.

https://github.com/espruino/Espruino/issues/740

I found the reference above - still trying to get my head around this?

So by using E.setBootCode(E.MemoryArea(X)) the code is not copied to Ram, so leaving Ram free?

gfwilliams commented 8 years ago

You don't need the memoryArea in E.setBootCode(E.MemoryArea(X)) - E.setBootCode saves whatever string you give it into flash, and then executes it straight from flash. But that doesn't really save you much. For example say I did it with:

var a = "<html> <a href=\"...\">...</a> ... lots more stuff ... </html>";

function b(c) { ... }

so:

E.setBootCode("var a = \"<htm.....

The string a is still stored in RAM, because even though it was executed from flash, what is in flash isn't the string data (for instance the " character had to be escaped). b is also stored in RAM, and while its code could be left stored in flash, it's not currently because it doesn't save a massive amount of space for small functions, and if you fiddled with flash while your uploaded code was running, everything would stop working.

wilberforce commented 8 years ago

Ok. I understand your example. So what is the difference between save() and E.setBootCode()?

Edit: http://forum.espruino.com/conversations/275207/. Explains it.

Start up code that runs before and independently than save();

gfwilliams commented 8 years ago

Yes, save() is more like 'hibernate' - it saves the current state of the interpreter into flash.

wilberforce commented 8 years ago

Well the good news is I can get a running interpreter with:

#if defined(ESP8266)
/// Structure for each symbol in the list of built-in symbols
typedef struct {
  uint32_t strOffset;
  void (*functionPtr)(void);
  uint32_t functionSpec; // JsnArgumentType
} PACKED_FLAGS JswSymPtr;

/// Information for each list of built-in symbols
typedef struct {
  const JswSymPtr *symbols;
  uint32_t symbolCount;
  const char *symbolChars;
} PACKED_FLAGS JswSymList;
#else
/// Structure for each symbol in the list of built-in symbols
typedef struct {
  unsigned short strOffset;
  void (*functionPtr)(void);
  unsigned short functionSpec; // JsnArgumentType
} PACKED_FLAGS JswSymPtr;

/// Information for each list of built-in symbols
typedef struct {
  const JswSymPtr *symbols;
  unsigned char symbolCount;
  const char *symbolChars;
} PACKED_FLAGS JswSymList;
#endif

However, when I add this:

  codeOut("static JswSymPtr jswSymbols_"+codeName+"[] __attribute__((section(\".irom.literal\"))) __attribute__((aligned(4))) = {\n  "+",\n  ".join(listSymbols)+"\n};");
#  codeOut("static const JswSymPtr jswSymbols_"+codeName+"[] = {\n  "+",\n  ".join(listSymbols)+"\n};");

It crashes and restarts ;-(

It was ok when I just did the Wifi symbols, so clearly something is not correct with the alignment reads ;-(

wilberforce commented 8 years ago

This moves the constant tables (not the strings) to flash.

https://github.com/wilberforce/Espruino/commit/95f5d97bfaaecc5e1cbaf7ba92b354a5c8b75bff

Check out the free heap! "freeHeap": 9072

 _____                 _
|   __|___ ___ ___ _ _|_|___ ___
|   __|_ -| . |  _| | | |   | . |
|_____|___|  _|_| |___|_|_|_|___|
          |_| http://espruino.com
 1v85.39 Copyright 2016 G.Williams
Espruino is Open Source. Our work is supported
only by sales of official boards and donations:
http://espruino.com/Donate
Flash map 4MB:512/512, manuf 0xe0 chip 0x4016
>require("ESP8266").getState();
={
  "sdkVersion": "1.5.0",
  "cpuFrequency": 160, "freeHeap": 9072, "maxCon": 10,
  "flashMap": "4MB:512/512",
  "flashKB": 4096,
  "flashChip": "0xe0 0x4016"
 }

I'm not sure why the change the .ld files are so large - possibly a line ending issue.

@gfwilliams I would do a pull request - however there is one pending, and it might confusing things!

jeeps - with the strings in flash, we might have space for the AES stuff to use TLS on this board too!

MaBecker commented 8 years ago

Wow - that is great !

tve commented 8 years ago

Looks like we've been working on the same stuff :-) I now have symbol tables and strings in flash. The result is bumping the interpreter up to 1600 jsvars and starting with 11920 bytes of heap at boot, which reduces to 9312 after it starts the telnet service and I connect via telnet.

>process.memory()
={ "free": 1565, "usage": 35, "total": 1600, "history": 12 }
>require("ESP8266").getState();
={
  "sdkVersion": "1.5.0",
  "cpuFrequency": 160, "freeHeap": 9312, "maxCon": 10,
  "flashMap": "4MB:512/512",
  "flashKB": 4096,
  "flashChip": "0xe0 0x4016"
 }

The changes are a tad more than I had hoped for, but it's not awful. I'm gonna prepare a build and a PR, but I think this needs some serious testing. I have not figured out how to test jswGetSymbolListForObject and jswGetSymbolListForObjectProto yet...

The changeset is https://github.com/tve/Espruino/commit/1b4c8c691e3dc00b20068110f594862662a98a46 but it's not yet working with v1.85 (but that seems to add another 1KB of heap, yay!).

wilberforce commented 8 years ago

Great news!

Is it worth increasing the vars from1400 to 1600? When do you "hit the wall" with vars? Does this alternately control the programme size?

wilberforce commented 8 years ago

The changeset is tve@1b4c8c6 but it's not yet working with v1.85 (but that seems to add another 1KB of heap, yay!).

What is required to work with V1.85?

It doesn't look like any of your changes would not work with V1.85?

tve commented 8 years ago

FYI: http://forum.espruino.com/comments/12928402/ links to an experimental build. Seems to work for me so far. Needs some cleanup so it can be merged without breaking other builds... Code in https://github.com/tve/Espruino/commit/cee7141a90b3c2c5a9354bdd353240970ae60198

wilberforce commented 8 years ago

working through topstrings, ota.c has a few string constants, here is the patch:

https://github.com/espruino/Espruino/commit/682807245a4f21ffc5fd48502d84c22e23194ffd

wilberforce commented 8 years ago

@tve @gfwilliams thanks for the updates.

Using websockets, I'm still experiencing heap space issues, so are looking to free up more.

using the make topreadonly target, is showing that the crypto functions are using large chunks of rodata.

Top 20 from ./topreadonly:
3616 ./espruino_esp8266_partial.o
768 ./libs/crypto/mbedtls/library/sha512.o
594 ./src/jslex.o
320 ./libs/crypto/mbedtls/library/sha256.o

Gordon is not keen on changing the source by adding macros to put the constants into irom, so I've been looking at the linker script.

By moving the irom0.text section before the .rodata section, the lib can be matched:

  .irom0.text : ALIGN(4)
  {
    _irom0_text_start = ABSOLUTE(.);
    *(.irom0.literal .irom.literal .irom.literal2 .irom.text.literal .irom0.text .irom.text)
    *espruino_esp8266_partial.o(.rodata)
    _irom0_text_end = ABSOLUTE(.);
  } >irom0_0_seg :irom0_0_phdr  

However this puts all rodata in irom

targeting like this does not work: *espruino_esp8266_partial.o:*sha512.o

@tve is there a way in the Makefile to exclude the sha*.o from the partial, and include in the elf part, or add a linker script to the partial rule to move the sha*.o rodata?

tve commented 8 years ago

Are you running out of heap using plain websockets or secure websockets? If the former, I think you're better off understanding why it's using so much heap. A single connection should not use more than ~5KB of heap space for buffers. You can control some of that by telling the SDK what TCP window size to advertize. Do you run out of send buffers or receive buffers? Or are you trying to run multiple connections at the same time?

If I were you I'd experiment a bit more to understand things. For example, I increased the number of jsvars from 1400 to 1600. Those 200 extra jsvars use 200*16=3200 bytes of RAM. That's pretty much the amount you can possibly free up on the track you are given that all of espruino_esp8266_partial.o uses 3616 bytes of RAM. So maybe reducing the number of jsvars is a much more expedient direction for you?

Another issue is heap fragmentation. The cesanta folks have published a different version of malloc that deals better with fragmentation and I've heard from others that it helps them. See https://blog.cesanta.com/embedded-heap-behaviour-analysis-and-improvement

In terms of the Makefile and linking, you would need to put something like sha256_mangled.o into the dependencies and add a rule to produce sha256_mangled.o from sha256.o that does the switching of sections you're looking for.

gfwilliams commented 8 years ago

In terms of the Makefile and linking, you would need to put something like sha256_mangled.o into the dependencies and add a rule to produce sha256_mangled.o from sha256.o that does the switching of sections you're looking for.

That would be awesome - potentially it could be used on a whole bunch of similar stuff.

wilberforce commented 8 years ago

Are you running out of heap using plain websockets or secure websockets? I'm using plain websockets. The full crypto implementation won't fit, so it's basically the SHA1 function only that is used.

I'll fire up the app again and keep a track in the debug window to see if I can determine when heap is getting used.

tve commented 8 years ago

You are talking about heap space, not JSvars, right? How many sockets do you have open? You could also insert/enable some print statements into the network code to tell you when heap memory gets allocated/released. You can't see when the SDK allocates receive buffers and I'm not sure whether it allocates some buffers on the sending side, but you can see most of this. Could it be that you have some socket leakage in the websocket implementation?

wilberforce commented 8 years ago

Using a flash read safe version of memcpy, the following can be moved from rodata using:

xtensa-lx106-elf-objcopy --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha1.o
xtensa-lx106-elf-objcopy --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha256.o
xtensa-lx106-elf-objcopy --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha512.o

To do this the export DISABLE_LTO=1 has to be defined, as the rodata sections get stripped away.

The difference before the flag:

  4 -rwxrwxr-x 1 esp8266 esp8266   2188 Apr 29 09:54 eagle.app.v6.data.bin
424 -rwxrwxr-x 1 esp8266 esp8266 430276 Apr 29 09:54 eagle.app.v6.irom0text.bin
 12 -rwxrwxr-x 1 esp8266 esp8266  10664 Apr 29 09:54 eagle.app.v6.rodata.bin
 32 -rwxrwxr-x 1 esp8266 esp8266  31132 Apr 29 09:54 eagle.app.v6.text.bin
** user1.bin uses  474340 bytes of 491520 available

and with -lto disabled:

  4 -rwxrwxr-x 1 esp8266 esp8266   2204 Apr 29 09:57 eagle.app.v6.data.bin
424 -rwxrwxr-x 1 esp8266 esp8266 433668 Apr 29 09:57 eagle.app.v6.irom0text.bin
 12 -rwxrwxr-x 1 esp8266 esp8266  10744 Apr 29 09:57 eagle.app.v6.rodata.bin
 32 -rwxrwxr-x 1 esp8266 esp8266  31140 Apr 29 09:57 eagle.app.v6.text.bin
To disassemble: xtensa-lx106-elf-objdump -d -l -x espruino_esp8266_user2.elf

So the effect of the flag is not great...

So after the relocations:

xtensa-lx106-elf-objcopy --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha512.o
xtensa-lx106-elf-objcopy --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha1.o
xtensa-lx106-elf-objcopy --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha256.o

To disassemble: xtensa-lx106-elf-objdump -d -l -x espruino_esp8266_user1.elf
  4 -rwxrwxr-x 1 esp8266 esp8266   2204 Apr 29 09:58 eagle.app.v6.data.bin
428 -rwxrwxr-x 1 esp8266 esp8266 434852 Apr 29 09:58 eagle.app.v6.irom0text.bin
 12 -rwxrwxr-x 1 esp8266 esp8266   9592 Apr 29 09:58 eagle.app.v6.rodata.bin
 32 -rwxrwxr-x 1 esp8266 esp8266  31140 Apr 29 09:58 eagle.app.v6.text.bin

So rodata has gone from 10664 to 9592.

The next step is to use a linker script rather than objcopy to rename the sections.

updating the Makefile: PARTIAL = espruino_esp8266_partial.o to PARTIAL = espruino_esp8266_partial.a

and the linker file:

    *espruino_esp8266_partial.a:sha1.c(.rodata.* .rodata)
    *espruino_esp8266_partial.a:sha256.c(.rodata.* .rodata)
    *espruino_esp8266_partial.a:sha512.c(.rodata.* .rodata)

however this targeting of the source files does not work? Any clues?

wilberforce commented 8 years ago

You are talking about heap space, not JSvars, right?

yes -

260015> Thu Apr 28 23:10:16 2016, heap: 9208
264951> Wifi event: probe request from station 48:d2:24:1c:2e:21, rssi = -79

How many sockets do you have open? You could also insert/enable some print statements into the network code to tell you when heap memory gets allocated/released. You can't see when the SDK allocates receive buffers and I'm not sure whether it allocates some buffers on the sending side, but you can see most of this.

Just one browser client.

Could it be that you have some socket leakage in the websocket implementation? I'm using the standard Espruino library.

As you say more debug might be the clue.

it is worth noting require("Wifi").stopAP(); releases quite a bit of heap!

122487> pm open,type:2 0
122487> mode : sta(5c:cf:7f:00:d9:75)
122571> Wifi.stopAP: opmode=sta
145048> Thu Apr 28 23:13:16 2016, heap: 10824
tve commented 8 years ago

Just one browser client.

One browser client can open an arbitrary number of connections...

wilberforce commented 8 years ago

Here are the changes to move the SHAx.o constants to flash, without changing the source files (baring a #define in mbedtls/config.h )

https://github.com/espruino/Espruino/pull/850

MaBecker commented 7 years ago

@wilberforce and @gfwilliams is there more potential or can this be closed?

gfwilliams commented 7 years ago

I think there was a script or something for finding out?

Personally I'd still be a big fan of adjusting the compiler flags to dump all read-only data in flash. It looks like there's now a very interesting -mforce-l32 flag built in to the compiler:

https://github.com/jerryscript-project/jerryscript/pull/1233/files#diff-adf743afa04809d14f7ec911d592a527

(it seems to be built into my version anyway)

If we could rely on that then we could really improve memory usage, but could also tidy up some of the really nasty ESP8266-related hacks that are currently in Espruino.