espruino / Espruino

The Espruino JavaScript interpreter - Official Repo
http://www.espruino.com/
Other
2.76k stars 741 forks source link

Adjust name string #2329

Closed notEvil closed 1 year ago

notEvil commented 1 year ago

Currently, when converting strings to names, existing StringExt are replaced by new ones. This introduces gaps, e.g.

1: String 2: StringExt 3: first empty 4: second empty

may become either 1 -> 3 (new StringExt) and 2 (first empty) -> 4, or 1 -> 3 -> 4 and 2 -> 5.

This PR changes the algorithm to reuse the existing StringExt by increasing the length of the string and shifting the characters. On my Bangle ~50% of the strings enter this part, each adding ~0.15 gaps on average (~4000 in ~30min).

I don't expect this to hit master anytime soon because it doesn't solve an actual issue and its value is debatable. The only real gain is potentially more space for flat strings and less GC.

gfwilliams commented 1 year ago

I just started to have a look at this again and I thought: how does this get called usually?

It seems jsvMakeIntoVariableName(jsvNewFromString is pretty common (or jsvMakeIntoVariableName(jslGetTokenValueAsVar which is basically the same).

I'm not sure if you have any thoughts on this but I'm wondering if it makes sense to just skip out the middle man and add a jsvNewVariableName function? It would avoid this little shuffle, and should provide a nice speed boost too

notEvil commented 1 year ago

Not really, except that it would be nice to get rid of this part :) Performance wise I have no clue how it would pan out

gfwilliams commented 1 year ago

Ok, just done - but at the same time I can't easily get rid of jsvMakeIntoVariableName as it's used in a bunch of places.

I'd be interested to see what you find fragmentation-wise with master now - I'd imagine that for the majority of code now, jsvMakeIntoVariableName never gets called with a string

notEvil commented 1 year ago

On my Bangle 1, clock face, few widgets

```diff diff --git a/libs/banglejs/jswrap_bangle.c b/libs/banglejs/jswrap_bangle.c index bcca85e..3587011 100644 --- a/libs/banglejs/jswrap_bangle.c +++ b/libs/banglejs/jswrap_bangle.c @@ -4341,6 +4341,14 @@ JsVar *jswrap_banglejs_dbg() { jsvObjectSetChildAndUnLock(o,"accHistoryIdx",jsvNewFromInteger(accHistoryIdx)); jsvObjectSetChildAndUnLock(o,"accGestureCount",jsvNewFromInteger(accGestureCount)); jsvObjectSetChildAndUnLock(o,"accIdleCount",jsvNewFromInteger(accIdleCount)); + jsvObjectSetChildAndUnLock(o, "namc", jsvNewFromInteger(name_count)); + jsvObjectSetChildAndUnLock(o, "makc", jsvNewFromInteger(make_count)); + jsvObjectSetChildAndUnLock(o, "extc", jsvNewFromInteger(ext_count)); + jsvObjectSetChildAndUnLock(o, "gapc", jsvNewFromInteger(gap_count)); + name_count = 0; + make_count = 0; + ext_count = 0; + gap_count = 0; return o; } diff --git a/src/jsvar.c b/src/jsvar.c index 07c53cf..8493079 100644 --- a/src/jsvar.c +++ b/src/jsvar.c @@ -618,12 +618,14 @@ static void jsvFreePtrInternal(JsVar *var) { jshInterruptOn(); } -void jsvFreePtrStringExt(JsVar* var) { +int jsvFreePtrStringExt(JsVar* var) { + int count = 0; JsVarRef ref = jsvGetLastChild(var); - if (!ref) return; + if (!ref) return count; JsVar* ext = jsvGetAddressOf(ref); while (true) { ext->flags = JSV_UNUSED; + count++; ref = jsvGetLastChild(ext); if (!ref) break; jsvSetNextSibling(ext, ref); @@ -634,6 +636,7 @@ void jsvFreePtrStringExt(JsVar* var) { jsVarFirstEmpty = jsvGetLastChild(var); touchedFreeList = true; jshInterruptOn(); + return count; } ALWAYS_INLINE void jsvFreePtr(JsVar *var) { @@ -1023,7 +1026,10 @@ JsVar *jsvNewFromString(const char *str) { return jsvNewNameOrString(str, false/*isName*/); } +int name_count; + JsVar *jsvNewNameFromString(const char *str, JsVar *valueOrZero) { + name_count++; JsVar *var = jsvNewNameOrString(str, true/*isName*/); if (var && valueOrZero) jsvSetFirstChild(var, jsvGetRef(jsvRef(valueOrZero))); @@ -1175,6 +1181,10 @@ JsVar *jsvNewArrayBufferFromString(JsVar *str, unsigned int lengthOrZero) { return arr; } +int make_count; +int ext_count; +int gap_count; + JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { if (!var) return 0; assert(jsvGetRefs(var)==0); // make sure it's unused @@ -1192,7 +1202,9 @@ JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { } var->flags = (JsVarFlags)(var->flags & ~JSV_VARTYPEMASK) | t; } else if (varType>=_JSV_STRING_START && varType<=_JSV_STRING_END) { + make_count++; if (jsvGetCharactersInVar(var) > JSVAR_DATA_STRING_NAME_LEN) { + ext_count++; /* Argh. String is too large to fit in a JSV_NAME! We must make * new STRINGEXTs to put the data in */ @@ -1221,7 +1233,7 @@ JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { } jsvSetCharactersInVar(var, JSVAR_DATA_STRING_NAME_LEN); // Free any old stringexts - jsvFreePtrStringExt(var); + gap_count += jsvFreePtrStringExt(var); // set up new stringexts jsvSetLastChild(var, jsvGetRef(startExt)); jsvSetNextSibling(var, 0); diff --git a/src/jsvar.h b/src/jsvar.h index 1d6fef5..1f750d3 100644 --- a/src/jsvar.h +++ b/src/jsvar.h @@ -779,4 +779,9 @@ void jsvFree(void *ptr); } \ } +extern int name_count; +extern int make_count; +extern int ext_count; +extern int gap_count; + #endif /* JSVAR_H_ */ ``` name_count = 10114 make_count = 15488 ext_count = 9745 gap_count = 1142
gfwilliams commented 1 year ago

Wow, thanks - that's really handy. Way more than I thought still using jsvMakeIntoVariableName

~60% are too big (previously 50%)

I'm not sure I understand this one? Or you mean that now 40% of the names are made using jsvNewNameFromString, now more of the calls to jsvMakeIntoVariableName are for oversize strings?

notEvil commented 1 year ago

It is ext_count / make_count, so how often is the string longer than 4 bytes given it is transformed by jsvMakeIntoVariableName.

gfwilliams commented 1 year ago

Ok, thanks - so it's not like my recent changes made it worse, it's just there are less calls in total but a slightly higher proportion are long?

I just pushed some changes - maybe try now? I started dumping the variable name and it seems it was getting called for most property accesses of builtins (we pass the name back so they're l-values). There was no need to convert those either so there should be a significant improvement now.

Where I do still see it is object definitions like {hello : 42} for instance which is a bit of a pain, but the amount of times MakeInto is called has dropped right off.

I think it's still worth merging this in though - but in general the less random shifting around of stuff we can do the better :)

notEvil commented 1 year ago

Ok, thanks - so it's not like my recent changes made it worse, it's just there are less calls in total but a slightly higher proportion are long?

yes

I just tried master with improved counting (individual make_count per line number)

```diff diff --git a/libs/banglejs/jswrap_bangle.c b/libs/banglejs/jswrap_bangle.c index ed9e86d..cb12722 100644 --- a/libs/banglejs/jswrap_bangle.c +++ b/libs/banglejs/jswrap_bangle.c @@ -4261,6 +4261,23 @@ JsVar *jswrap_banglejs_dbg() { jsvObjectSetChildAndUnLock(o,"accGestureCount",jsvNewFromInteger(accGestureCount)); jsvObjectSetChildAndUnLock(o,"accIdleCount",jsvNewFromInteger(accIdleCount)); jsvObjectSetChildAndUnLock(o,"pollInterval",jsvNewFromInteger(pollInterval)); + jsvObjectSetChildAndUnLock(o, "namc", jsvNewFromInteger(name_count)); + jsvObjectSetChildAndUnLock(o, "makc", jsvNewFromInteger(make_count)); + jsvObjectSetChildAndUnLock(o, "extc", jsvNewFromInteger(ext_count)); + jsvObjectSetChildAndUnLock(o, "gapc", jsvNewFromInteger(gap_count)); + name_count = 0; + make_count = 0; + ext_count = 0; + gap_count = 0; + JsVar *a = jsvNewEmptyArray(); + for (int index = 0; index < 30; index++) { + if (line_numbers[index] == 0) break; + jsvArrayPushAndUnLock(a, jsvNewFromInteger(line_numbers[index])); + jsvArrayPushAndUnLock(a, jsvNewFromInteger(counts[index])); + line_numbers[index] = 0; + counts[index] = 0; + } + jsvObjectSetChildAndUnLock(o, "cnts", a); return o; } diff --git a/src/jsvar.c b/src/jsvar.c index 5586ef3..e1afff5 100644 --- a/src/jsvar.c +++ b/src/jsvar.c @@ -1042,7 +1042,10 @@ JsVar *jsvNewFromString(const char *str) { return jsvNewNameOrString(str, false/*isName*/); } +int name_count; + JsVar *jsvNewNameFromString(const char *str) { + name_count++; return jsvNewNameOrString(str, true/*isName*/); } @@ -1195,7 +1198,13 @@ JsVar *jsvNewArrayBufferFromString(JsVar *str, unsigned int lengthOrZero) { return arr; } -JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { +int make_count; +int ext_count; +int gap_count; +int line_numbers[30]; +int counts[30]; + +JsVar *_jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { if (!var) return 0; assert(jsvGetRefs(var)==0); // make sure it's unused assert(jsvIsSimpleInt(var) || jsvIsString(var)); @@ -1212,7 +1221,10 @@ JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { } var->flags = (JsVarFlags)(var->flags & ~JSV_VARTYPEMASK) | t; } else if (varType>=_JSV_STRING_START && varType<=_JSV_STRING_END) { + make_count++; if (jsvGetCharactersInVar(var) > JSVAR_DATA_STRING_NAME_LEN) { + ext_count++; + if (jsvGetLastChild(var)) gap_count++; /* Argh. String is too large to fit in a JSV_NAME! We must chomp * new STRINGEXTs to put the data in */ @@ -1287,6 +1299,20 @@ JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { return var; } +void _count(int line_number) { + for (int index = 0; index < 30; index++) { + if (line_numbers[index] == line_number) { + counts[index]++; + break; + } + if (line_numbers[index] == 0) { + line_numbers[index] = line_number; + counts[index] = 1; + break; + } + } +} + void jsvMakeFunctionParameter(JsVar *v) { assert(jsvIsString(v)); if (!jsvIsName(v)) jsvMakeIntoVariableName(v,0); diff --git a/src/jsvar.h b/src/jsvar.h index d325c46..bb9403d 100644 --- a/src/jsvar.h +++ b/src/jsvar.h @@ -326,7 +326,9 @@ JsVar *jsvNewFromPin(int pin); /// Turns var into a Variable name that links to the given value... No locking so no need to unlock var -JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero); +JsVar *_jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero); +void _count(int line_number); +#define jsvMakeIntoVariableName(a, b) ({ if (jsvIsString(a)) _count(__LINE__); _jsvMakeIntoVariableName(a, b); }) /// Turns var into a 'function parameter' that the parser recognises when parsing a function void jsvMakeFunctionParameter(JsVar *v); /// Add a new function parameter to a function (name may be 0) - use this when binding function arguments This unlocks paramName if specified, but not value. @@ -788,6 +790,13 @@ void jsvFree(void *ptr); } \ } +extern int name_count; +extern int make_count; +extern int ext_count; +extern int gap_count; +extern int line_numbers[]; +extern int counts[]; + #endif /* JSVAR_H_ */ #if defined(JSVAR_MALLOC) && defined(ESPR_EMBED) ```

and the counts are

1318: 6509 (jsvMakeFunctionParameter) 2922: 1103 (jsvAsName first use) 144: 164 (not in jsvar.c) 2925: 39 (jsvAsName second use)

Unfortunately there is a serious memory leak now, running GC doesn't help. My previous experiments were done on 2v16 with some changes cherry picked (https://gitlab.com/notEvil/cyclometer/-/blob/master/Bangle.js/Espruino/build.py), so I don't know where this was introduced.

And fyi, I've reverted the branch so the build for my project wouldn't break. It should be fine since the PR is already merged. I've created a copy called adjust_string_merge which is where you left it.

gfwilliams commented 1 year ago

Ok, thanks. Do you have any more info about the memory leak and how to cause it? I'm not really seeing anything on my Bangle.js 2 now - just flipping between clock and launcher and I'm not seeing memory usage rising at all?

Also the command-line tests for Linux should really pick up any obvious leaks since it's one of the things they check for after every test

notEvil commented 1 year ago

Yes, its the patch above. Without it, everything is fine. Its not supposed to mess with JS memory but somehow does.

edit: and now I know why: the #define needs a temporary variable, otherwise it will execute a twice ... Sry for the distraction!

notEvil commented 1 year ago

now with a non-broken counting build, for future reference or improvement attempts:

name_count: 58471 make_count: 19679 (~25% of name_count + make_count) ext_count: 11716 (~60% of make_count) gap_count: 95 (~0.008 on average, if it were the old algorithm)

L1318: 16970 (jsvMakeFunctionParameter) L2922: 2381 (jsvAsName first use) L144: 315 (somewhere else) L2925: 13 (jsvAsName second use)

gfwilliams commented 1 year ago

Yes, its the patch above.

Ahh! you mean your counting patch? Not this PR itself? Sorry - got confused as the PR itself seems to work fine.

Thanks for those stats. L1318 is an odd one as it would appear that only applies for getters and setters which we don't actually use all that much from code in BangleApps (I thought!). Do you have any custom code you're using or is what you're debugging with pretty much standard?

Thanks again for your work on this - poking around in the internals and instrumenting things is super helpful. The interpreter itself is quite old now and there are definitely some non-optimal things happening. While when I get time I do try and take a look, as this shows there's plenty of stuff I miss!

notEvil commented 1 year ago

Ahh! you mean your counting patch? Not this PR itself?

Yes, should have stated this more clearly

Do you have any custom code you're using or is what you're debugging with pretty much standard?

Again for reference, I used https://github.com/espruino/Espruino/commit/95e8a321d24f3ad0d710c82c75678676bce0ac76 with the following patch (counting)

```diff diff --git a/libs/banglejs/jswrap_bangle.c b/libs/banglejs/jswrap_bangle.c index ed9e86d..bad2f20 100644 --- a/libs/banglejs/jswrap_bangle.c +++ b/libs/banglejs/jswrap_bangle.c @@ -4261,6 +4261,26 @@ JsVar *jswrap_banglejs_dbg() { jsvObjectSetChildAndUnLock(o,"accGestureCount",jsvNewFromInteger(accGestureCount)); jsvObjectSetChildAndUnLock(o,"accIdleCount",jsvNewFromInteger(accIdleCount)); jsvObjectSetChildAndUnLock(o,"pollInterval",jsvNewFromInteger(pollInterval)); + + jsvObjectSetChildAndUnLock(o, "namc", jsvNewFromInteger(name_count)); + jsvObjectSetChildAndUnLock(o, "makc", jsvNewFromInteger(make_count)); + jsvObjectSetChildAndUnLock(o, "extc", jsvNewFromInteger(ext_count)); + jsvObjectSetChildAndUnLock(o, "gapc", jsvNewFromInteger(gap_count)); + name_count = 0; + make_count = 0; + ext_count = 0; + gap_count = 0; + + JsVar *a = jsvNewEmptyArray(); + for (int index = 0; index < 20; index++) { + if (line_numbers[index] == 0) break; + jsvArrayPushAndUnLock(a, jsvNewFromInteger(line_numbers[index])); + jsvArrayPushAndUnLock(a, jsvNewFromInteger(make_counts[index])); + line_numbers[index] = 0; + make_counts[index] = 0; + } + jsvObjectSetChildAndUnLock(o, "cnts", a); + return o; } diff --git a/src/jsvar.c b/src/jsvar.c index 5586ef3..43c0303 100644 --- a/src/jsvar.c +++ b/src/jsvar.c @@ -1042,7 +1042,10 @@ JsVar *jsvNewFromString(const char *str) { return jsvNewNameOrString(str, false/*isName*/); } +int name_count; + JsVar *jsvNewNameFromString(const char *str) { + name_count++; return jsvNewNameOrString(str, true/*isName*/); } @@ -1195,7 +1198,13 @@ JsVar *jsvNewArrayBufferFromString(JsVar *str, unsigned int lengthOrZero) { return arr; } -JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { +int make_count; +int ext_count; +int gap_count; +int line_numbers[20]; +int make_counts[20]; + +JsVar *_jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { if (!var) return 0; assert(jsvGetRefs(var)==0); // make sure it's unused assert(jsvIsSimpleInt(var) || jsvIsString(var)); @@ -1212,7 +1221,10 @@ JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { } var->flags = (JsVarFlags)(var->flags & ~JSV_VARTYPEMASK) | t; } else if (varType>=_JSV_STRING_START && varType<=_JSV_STRING_END) { + make_count++; if (jsvGetCharactersInVar(var) > JSVAR_DATA_STRING_NAME_LEN) { + ext_count++; + if (jsvGetLastChild(var)) gap_count++; /* Argh. String is too large to fit in a JSV_NAME! We must chomp * new STRINGEXTs to put the data in */ @@ -1287,6 +1299,20 @@ JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero) { return var; } +void _count(int line_number) { + for (int index = 0; index < 20; index++) { + if (line_numbers[index] == line_number) { + make_counts[index]++; + break; + } + if (line_numbers[index] == 0) { + line_numbers[index] = line_number; + make_counts[index] = 1; + break; + } + } +} + void jsvMakeFunctionParameter(JsVar *v) { assert(jsvIsString(v)); if (!jsvIsName(v)) jsvMakeIntoVariableName(v,0); diff --git a/src/jsvar.h b/src/jsvar.h index d325c46..79acab1 100644 --- a/src/jsvar.h +++ b/src/jsvar.h @@ -326,7 +326,9 @@ JsVar *jsvNewFromPin(int pin); /// Turns var into a Variable name that links to the given value... No locking so no need to unlock var -JsVar *jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero); +JsVar *_jsvMakeIntoVariableName(JsVar *var, JsVar *valueOrZero); +void _count(int line_number); +#define jsvMakeIntoVariableName(a, b) ({ JsVar* __v = a; if (jsvIsString(__v)) _count(__LINE__); _jsvMakeIntoVariableName(__v, b); }) /// Turns var into a 'function parameter' that the parser recognises when parsing a function void jsvMakeFunctionParameter(JsVar *v); /// Add a new function parameter to a function (name may be 0) - use this when binding function arguments This unlocks paramName if specified, but not value. @@ -788,6 +790,13 @@ void jsvFree(void *ptr); } \ } +extern int name_count; +extern int make_count; +extern int ext_count; +extern int gap_count; +extern int line_numbers[]; +extern int make_counts[]; + #endif /* JSVAR_H_ */ #if defined(JSVAR_MALLOC) && defined(ESPR_EMBED) ```

and the following list of apps which might ran in the background (didn't start any of them from the launcher)

about (0.14) accelrec (0.02) alarm (0.38) boldclk (0.07) boot (0.56) chargent (0.03) dtlaunch (0.22) fileman (0.03) files (0.08) health (0.22) hrm (0.11) notify (0.12) sched (0.21) setting (0.58) wid_edit (0.02) widalarm (0.01) widchime (0.02) widdevst (0.02) widminbat (0.02) widram (0.03)

So nothing out of the ordinary I'd say

Thanks again for your work on this - poking around in the internals and instrumenting things is super helpful. The interpreter itself is quite old now and there are definitely some non-optimal things happening. While when I get time I do try and take a look, as this shows there's plenty of stuff I miss!

You are welcome! I learned a lot about development on a constrained platform and the codebase is a pleasure to work with. Reliability beats optimality imo. What would be nice though is call site profiling :) But thats difficult to achieve I'd assume

gfwilliams commented 1 year ago

Thanks!

As far as call profiling, the best I've got really is compiling and running for Linux on a Raspberry Pi with gprof. Although that generates standard ARM code rather than Thumb (and the Pi has a more complicated processor) I thought it provided a good indication.

It looks like on Rasperry Pi 2+ it's possible to compile to thumb2 with "-mthumb" but it looks like they don't include the stdlib for that so it would actually be a bit more involved to get it building fully.

I have occasionally had runs through trying to find obvious time sinks, but that's all been with reasonably basic code like the stuff in the benchmarks directory. There would be a lot to be said for using the BANGLEJS2_LINUX Linux Bangle-emulation build (a bit like we do for emscripten) and then trying to run through a pre-programmed set of tasks and then using that as a basis for benchmarking.