Open denis-migdal opened 9 months ago
Found a part of the code responsible for more than 34% of CPU time (66ms/163ms), even when the Brython code is empty :
$B.unicode[gc].forEach(function(item){if(Array.isArray(item)){var step=item[2]||1
for(var i=0,nb=item[1];i < nb;i+=1){$B.unicode_tables[gc][item[0]+i*step]=true}}else{$B.unicode_tables[gc][item]=true}}
Some things that might increase performances :
x+=1
should be replaced by ++x
(faster). (a search for +=1
and += 1
should help identify them.$B.unicode_tables[gc]
pre-compute it before the forEach
?for(var i = 0, max=item[1]*step; i < max ; i+= step) { ...[i] = true }
?forEach()
, it seems slowArray.isArray
?So something like (?) :
var ugc = $B.unicode_tables[gc];
for(let x = 0; x < ugc.length; ++x) {
var item = ugc[x];
var max = item[1];
if( max === undefined) { // not an array ?
ugc[item] = true
} else {
var i = item[0]
var step = item[2]||1;
max = max * step + i
for( ;i < max;i+=step)
ugc[i] = true
}
}
EDIT: I don't really get what this function is doing, but I guess precomputing everything and storing the result in brython.js isn't possible (brython.js would then be too big) ?
JavaScript has some functions to handle unicode : fromCharCode
and charCodeAt
, but I assume it isn't what you are doing ?
Brython functions call are converted to (according to documentation : https://github.com/brython-dev/brython/wiki/How-Brython-works ):
f.apply(null, [1].concat(list(t)).concat([{$nat: "kw", kw:[{x: 2}, d]}))
However, it is 48% slower compared to :
f(1, ...list(t), {$nat: "kw", kw:[{x: 2}, d]});
EDIT: is there also a reason to not do : {$nat: "kw", kw:{x: 2, ...d}}
or even directly {$nat: {x: 2, ...d}}
?
Also, for potential execution speed gain, see also :
Tested the performance of WeakMap
:
For a jsobj2pyobj()
doing almost nothing (so not quite a fair comparison) :
So quite a small overhead considering that my jsobj2pyobj()
is doing almost nothing
Also for :
Imma test other things I saw too.
jsobj.toString().indexOf('.') == -1
is 6% slower compared to jsobj % 1 === 0
.
Note that the latter is quicker AND more readable.
Also, you can pre-allocate arrays (-18% perfs) both in jsobj2pyobj()
and pyobj2jsobj()
(and you seem to do that a lot):
var args = []
for(var i = 0, len = arguments.length; i < len; i++){
args.push(pyobj2jsobj(arguments[i]))
}
var args = new Array(arguments.length)
for(var i = 0, len = arguments.length; i < len; i++){
args[i] = pyobj2jsobj(arguments[i])
}
EDIT : if you don't know the final length, you can also do :
var args = new Array(arguments.length)
var size = 0
for(var i = 0, len = arguments.length; i < len; i++){
if(???) args[size++] = pyobj2jsobj(arguments[i])
}
args.length = size
Now that ES6 is out since 2015, I think all browser now implements it. So would relying more on ES6 "new" classes also improve performances ?
Found a part of the code responsible for more than 34% of CPU time (66ms/163ms), even when the Brython code is empty :
$B.unicode[gc].forEach(function(item){if(Array.isArray(item)){var step=item[2]||1
for(var i=0,nb=item[1];i < nb;i+=1){$B.unicode_tables[gc][item[0]+i*step]=true}}else{$B.unicode_tables[gc][item]=true}}
Wow, this is amazing ! How did you find it ?
In the commit referenced above I have modified a similar code for Unicode tables XID_Start and XID_Continue: I have completely removed the lines in unicode_data.js
for(const key in $B.unicode_identifiers){
$B.unicode_tables[key] = {}
for(const item of $B.unicode_identifiers[key]){
if(Array.isArray(item)){
for(var i = 0; i < item[1]; i++){
$B.unicode_tables[key][item[0] + i] = true
}
}else{
$B.unicode_tables[key][item] = true
}
}
}
and added functions $B.is_XID_Start(codepoint)
and $B.is_XID_Continue(codepoint)
in python_tokenizer.js
that check if a codepoint is in one of these categories by searching directly (by bisection) in the table $B.unicode_identifiers
.
As far as I could test, this reduces startup time by around 200ms and has no impact on execution speed.
I am pretty sure we can do the same for the lines you mention.
JS Set() for py_set is possible and/or could boost the performances (as well as reducing the code size).
I wish it were possible, but I am not sure. Javascript Set()
covers most of the features of Python set
but in some cases I'm afraid it will be hard to get the same result, for instance with this test (in test_set.py
):
class A:
def __init__(self, x):
self.x = x
def __eq__(self, other):
return self.x
def __hash__(self):
return hash('a')
set1 = {'a', 'b', 'c', 0, 1, 2, 1.5, 2.5, 3.5}
assert A(True) in set1
The assertion is true because A(True)
has the same hash as and compares equal to 'a'
...
Wow, this is amazing ! How did you find it ?
In browser's development tools, you have a "Performance tab". I am not sure how to use it in Firefox, but on Chromium :
If you want to test small parts of code, website like JSPerf are quite useful. Careful, as performances can be very different between Chromium and Firefox.
But some stuff are quite generic, e.g. :
Push will pre-allocate e.g. 4 spaces, then once you insert the 5th element, it'll allocate the double. So for big arrays you'll get reallocations (and possibly copy of the memory), which is very costly (this is a system call). Performances gains can be really enormous for big arrays, (e.g. x50/x100 perfs ?).
As far as I could test, this reduces startup time by around 200ms and has no impact on execution speed.
I am pretty sure we can do the same for the lines you mention.
Maybe, yes. Though, I am not quite sure to get what you do exactly with unicode (as JS has support for encoding/decoding).
The assertion is true because
A(True)
has the same hash as and compares equal to'a'
...
Hmmm....
In this case, why not using a JS Map
, putting the hash as the key, and stored objects (an array) as a value ?
Then, to test if an object is in the Set, find the elements having the same hash 0(1), then compare them with __eq__
0(n) (but you shouldn't have too much objects with the same hash) ?
EDIT: For set operations, maybe building a Set of hashes could speed up or simplify some operations implementations ?
As far as I could test, this reduces startup time by around 200ms and has no impact on execution speed.
One think that is interesting to do, is, in optimization commits, putting the relative and absolute time gain : e.g. "-22% startup time (300ms -> 200ms)", so that you can :
Wow, this is amazing ! How did you find it ?
In browser's development tools, you have a "Performance tab". I am not sure how to use it in Firefox, but on Chromium :
In Chromium you also have a tab "Lighthouse", that have at the top right "Analyze page load". Note that it doesn't work with "file://" (so you need a local server), and you need to have some content in the page body.
The goal is to have "100%" in the Performance score. With your recent change, it went from "54%" to "58%?" when you added support for defer (from memory), and now it is at 88% with your recent commit (83% without defer). That is a huge increase ^^.
You still have 450ms of blocking time so still have a margin of progress I assume.
Though I use "browser-sync" to have my local server, so it hurts a little the performances.
You also have in Chromium other tabs :
Though, I am not quite sure to get what you do exactly with unicode (as JS has support for encoding/decoding).
It is used internally in various places, for instance to check valid identifiers as described here, to support code like
# non-ASCII variable names
donnée = 10
машина = 9
ήλιος = 4
assert donnée + машина + ήλιος == 23
# Korean
def 안녕하세요():
return "hello"
assert 안녕하세요() == "hello"
Though, I am not quite sure to get what you do exactly with unicode (as JS has support for encoding/decoding).
It is used internally in various places, for instance to check valid identifiers as described here, to support code like
Ouch... It seems to be a very inefficient way of doing it.
By default, JS strings are in UTF16, so you should be able to do that very efficiently by verifying if the char code is between given ranges. If you have lot of ranges you may even have some kind of binary search of ranges (maybe in a form of a binary decision tree - that you could even somehow flatten to an array just by playing with the indexes ?).
If you have ranges with "holes" you can have some "exception ranges". And you can also detect some patterns with "%", Math.floor(x/Z)
, etc.
I think you have ways to really speed up this process.
Moreover i think I saw the code for unicode takes ~100kB, so you may even reduce brython.js size by 10% (as well as downloading and JS interpretation time) ?
If you have a different encoding, you can first convert them :
var decoder = new TextDecoder('utf-8'), // utf8 to utf16
decodedMessage = decoder.decode(texte);
var encoder = new TextEncoder('utf-8'), // utf16 to utf8
encodedMessage = encoder.encode(texte);
From that you can also convert it to bytes
, Uint8Array
etc.
Performance wise :
// code stolen from https://gist.github.com/fabiospampinato/014d15872e2129774ae23783bd377ad2
const content = 'e'.padEnd(2000000, 'z')
const charCodesAt = new Array ( content.length );
console.time('charCodeAt');
for ( let i = 0, l = content.length; i < l; i++ ) {
charCodesAt[i] = content.charCodeAt ( i );
}
console.timeEnd('charCodeAt');
const encoder = new TextEncoder ();
console.time('TextEncoder');
encoder.encode ( content )
console.timeEnd('TextEncoder');
console.log ( content.length );
It seems you should use Regex :
JS Regex can be unicode aware.
cf https://stackoverflow.com/questions/9862761/how-to-check-if-character-is-a-letter-in-javascript
RegExp(/^\p{L}/,'u').test(str)
See how to use unicode classes : https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape
Okay, so performance wise, it seems that :
You can type unicode char in JS : "\u006F" or copy directly the unicode character "o" as it is supported.
More and more amazing ! I didn't know about this RegExp feature, /\p{Letter}/u
. Yes, at first sight it could replace the use of Unicode tables and save quite a few kB in brython.js. Thanks !
In the commit above I have adapted the scripts that used the Unicode tables wherever I could. unicode_data.js
is reduced to 3kB for cases where I couldn't find Javascript solutions (getting the integer value of digits in non-latin alphabets for instance).
Thanks again !
Wow, you jumped from a performance score of 88% last time to 97% !!!! that is starting to be good ^^. You went from 163ms load time to 22ms !!!.
That's... enormous ^^.
I think now a big part of the load performances is due to brython.js size, but I think it can still be reduced by factorizing checks (how much I don't know though), but should also helps compression.
Maybe $B.$make_ext
and run_script
could still be optimized (for at most a 10% execution speed gain).
But I don't think it'll be quite interesting. I think next step would be to accelerate Brython execution time, for that there are generally 3 ways :
I found lot of things to optimize for code size and performances, that'd be quite promising at the end ^^. We'll have to compare the benchmarks at the end xD.
Still thinking about JS <=> Brython conversions...
Brython => JS : easy
if( '$jsobj in elem) return elem.$jsobj
O(1), really big speed increase.elem.$jsobj
: should not occurs (?), would be a very small additional cost that'd pay off quite quickly.JS => Brython : needs more thinking
WeakMap
: costly in memory and execution speed. BUT potentially HUGE memory gain when objects are converted/accessed several time.
WeakMap
: so ofc costs more, except if the same object is present several times in an array/object values.Lazy JS =>Brython : convert it only upon first access (mainly solves Brython<=>JS sync issues)
at()
for arrays, a getAttr()
for objects that can reduce access speed.Benchmarks :
Array.at()
is doing), you are already nearly 30% slower (so with both, you'll be at most 100% slower ?).$buildTempShallowClone()
and a $getTempDeepClone()
???obj.at = DIRECT_ACCESS
at init, and obj.at = LAZY_ACCESS
when given to JS, then use obj.at(idx)
to access indexes.pyObj2jsObj()
change the obj.at
(i.e. when giving it to JS)pyObj2jsObj()
modification of obj.at
when calling JS functions we know won't store or modify the array ?$buildTempShallowClone()
and a $getTempDeepClone()
internally when we think access time > conversion time ? [but I'd assume conversion time >> access time in most of cases ?] so the access 100% overcost might be nothing compared to conversion time. So might be a very niche usage when doing intensive accesses ?let array2 = array.map( e => { return {obj: e} });
at = (array, idx) => {
let val = array2[idx];
if( val !== undefined && val.obj === array[idx])
return val
// we should make the lazy conversion here
// but that wouldn't be fair.
throw 'not implemented'
};
Imma checks for objects too.
Okay, good news for objects, the overcost seems quite limited (2% slower to 10% slower). The difference seems mainly due to browser runtime optimizations I assume.
So a lazy strategy might not be an issue for objects. I think you already use something for objects, so it shouldn't cost more for objects.
Summary ========
I can't do
Quite quick
[ ] [2275] Improve performances upon fct call
[ ] [2274] Optimise conditions transcription in AST.
[ ] [2260] -48% speed performances on Brython function calls (more info)
[ ] [2260] non-preallocated array, -18% speed performances (+reduce memory usage) (more info)
[x] [2260] check if float -6% speed performances (+ alternative more readable) (more info)
Search in code
BigInt64Array
instead ofBigInt
(more info)checks-mode=unsafe
option to speed up runtime time by removing all checks. Checks are 28% slower. (more info)toString().indexOf('.')
(replace by%1
).Set
in py_set.js ?DONE
===================================================================================
Hi,
I noticed some parts of Brython code were like that :
However, it seems 42% slower compared to simply doing :
I'd advise to search for all \n` in Brython code, remove the "+", and replace them by a new line.
Also, in some parts there is :
That seems 95% slower.
I'd advise to search for all
+= "
in Brython code, remove the "+=", and replace them by a string template.Cordially,