brython-dev / brython

Brython (Browser Python) is an implementation of Python 3 running in the browser
BSD 3-Clause "New" or "Revised" License
6.31k stars 513 forks source link

[List] Several HUGE performances optimisations. #2260

Open denis-migdal opened 9 months ago

denis-migdal commented 9 months ago

Summary ========

I can't do

Quite quick

Search in code

DONE

===================================================================================

Hi,

I noticed some parts of Brython code were like that :

x = `First line\n` +
`Second line\n` +
`Third line`

However, it seems 42% slower compared to simply doing :

x = `First line
Second line
Third line`

I'd advise to search for all \n` in Brython code, remove the "+", and replace them by a new line.

Also, in some parts there is :

x+= "eee\n"
x+= "eee\n"
x+= "eee\n"

That seems 95% slower.

I'd advise to search for all += " in Brython code, remove the "+=", and replace them by a string template.

Cordially,

denis-migdal commented 9 months ago

[DONE]

Found a part of the code responsible for more than 34% of CPU time (66ms/163ms), even when the Brython code is empty :

$B.unicode[gc].forEach(function(item){if(Array.isArray(item)){var step=item[2]||1
for(var i=0,nb=item[1];i < nb;i+=1){$B.unicode_tables[gc][item[0]+i*step]=true}}else{$B.unicode_tables[gc][item]=true}}

Some things that might increase performances :

So something like (?) :

var ugc = $B.unicode_tables[gc];
for(let x = 0; x < ugc.length; ++x) {
     var item = ugc[x];
     var max = item[1];
     if( max === undefined) { // not an array ?
          ugc[item] = true
     } else {
          var i = item[0]
          var step = item[2]||1;
          max = max * step + i
          for( ;i < max;i+=step)
                ugc[i] = true
     }
}

EDIT: I don't really get what this function is doing, but I guess precomputing everything and storing the result in brython.js isn't possible (brython.js would then be too big) ?

JavaScript has some functions to handle unicode : fromCharCode and charCodeAt, but I assume it isn't what you are doing ?

denis-migdal commented 9 months ago

Brython functions call are converted to (according to documentation : https://github.com/brython-dev/brython/wiki/How-Brython-works ):

f.apply(null, [1].concat(list(t)).concat([{$nat: "kw", kw:[{x: 2}, d]}))

However, it is 48% slower compared to :

f(1, ...list(t), {$nat: "kw", kw:[{x: 2}, d]});

EDIT: is there also a reason to not do : {$nat: "kw", kw:{x: 2, ...d}} or even directly {$nat: {x: 2, ...d}} ?

denis-migdal commented 9 months ago

Also, for potential execution speed gain, see also :

denis-migdal commented 9 months ago

Tested the performance of WeakMap :

For a jsobj2pyobj() doing almost nothing (so not quite a fair comparison) :

So quite a small overhead considering that my jsobj2pyobj() is doing almost nothing

Also for :

Imma test other things I saw too.

denis-migdal commented 9 months ago

jsobj.toString().indexOf('.') == -1 is 6% slower compared to jsobj % 1 === 0 . Note that the latter is quicker AND more readable.

Also, you can pre-allocate arrays (-18% perfs) both in jsobj2pyobj() and pyobj2jsobj() (and you seem to do that a lot):

 var args = []
            for(var i = 0, len = arguments.length; i < len; i++){
                args.push(pyobj2jsobj(arguments[i]))
            }
 var args = new Array(arguments.length)
            for(var i = 0, len = arguments.length; i < len; i++){
                args[i] = pyobj2jsobj(arguments[i])
            }

EDIT : if you don't know the final length, you can also do :

 var args = new Array(arguments.length)
var size = 0
            for(var i = 0, len = arguments.length; i < len; i++){
                if(???) args[size++] = pyobj2jsobj(arguments[i])
            }
args.length = size
denis-migdal commented 9 months ago

Now that ES6 is out since 2015, I think all browser now implements it. So would relying more on ES6 "new" classes also improve performances ?

PierreQuentel commented 8 months ago

Found a part of the code responsible for more than 34% of CPU time (66ms/163ms), even when the Brython code is empty :

$B.unicode[gc].forEach(function(item){if(Array.isArray(item)){var step=item[2]||1
for(var i=0,nb=item[1];i < nb;i+=1){$B.unicode_tables[gc][item[0]+i*step]=true}}else{$B.unicode_tables[gc][item]=true}}

Wow, this is amazing ! How did you find it ?

In the commit referenced above I have modified a similar code for Unicode tables XID_Start and XID_Continue: I have completely removed the lines in unicode_data.js

for(const key in $B.unicode_identifiers){
    $B.unicode_tables[key] = {}
    for(const item of $B.unicode_identifiers[key]){
        if(Array.isArray(item)){
            for(var i = 0; i < item[1]; i++){
                $B.unicode_tables[key][item[0] + i] = true
            }
        }else{
            $B.unicode_tables[key][item] = true
        }
    }
}

and added functions $B.is_XID_Start(codepoint) and $B.is_XID_Continue(codepoint) in python_tokenizer.js that check if a codepoint is in one of these categories by searching directly (by bisection) in the table $B.unicode_identifiers.

As far as I could test, this reduces startup time by around 200ms and has no impact on execution speed.

I am pretty sure we can do the same for the lines you mention.

PierreQuentel commented 8 months ago

JS Set() for py_set is possible and/or could boost the performances (as well as reducing the code size).

I wish it were possible, but I am not sure. Javascript Set() covers most of the features of Python set but in some cases I'm afraid it will be hard to get the same result, for instance with this test (in test_set.py):

class A:
  def __init__(self, x):
    self.x = x
  def __eq__(self, other):
    return self.x
  def __hash__(self):
    return hash('a')

set1 = {'a', 'b', 'c', 0, 1, 2, 1.5, 2.5, 3.5}

assert A(True) in set1

The assertion is true because A(True) has the same hash as and compares equal to 'a'...

denis-migdal commented 8 months ago

Wow, this is amazing ! How did you find it ?

In browser's development tools, you have a "Performance tab". I am not sure how to use it in Firefox, but on Chromium :

  1. Ctrl+Maj+I
  2. Performance tab
  3. Either record or reload (depending whether you want to record the page load or the execution after e.g. clicking a button)
  4. Once done, click on "stop"
  5. Click on the "Bottom-up" tab (Call tree works too)
  6. Sort by "Total time"
  7. Then in "Activity", click on the triangles to find the functions that has the higher %.

If you want to test small parts of code, website like JSPerf are quite useful. Careful, as performances can be very different between Chromium and Firefox.

But some stuff are quite generic, e.g. :

Push will pre-allocate e.g. 4 spaces, then once you insert the 5th element, it'll allocate the double. So for big arrays you'll get reallocations (and possibly copy of the memory), which is very costly (this is a system call). Performances gains can be really enormous for big arrays, (e.g. x50/x100 perfs ?).

As far as I could test, this reduces startup time by around 200ms and has no impact on execution speed.

I am pretty sure we can do the same for the lines you mention.

Maybe, yes. Though, I am not quite sure to get what you do exactly with unicode (as JS has support for encoding/decoding).

denis-migdal commented 8 months ago

The assertion is true because A(True) has the same hash as and compares equal to 'a'...

Hmmm....

In this case, why not using a JS Map, putting the hash as the key, and stored objects (an array) as a value ?

Then, to test if an object is in the Set, find the elements having the same hash 0(1), then compare them with __eq__ 0(n) (but you shouldn't have too much objects with the same hash) ?

EDIT: For set operations, maybe building a Set of hashes could speed up or simplify some operations implementations ?

denis-migdal commented 8 months ago

As far as I could test, this reduces startup time by around 200ms and has no impact on execution speed.

One think that is interesting to do, is, in optimization commits, putting the relative and absolute time gain : e.g. "-22% startup time (300ms -> 200ms)", so that you can :

denis-migdal commented 8 months ago

Wow, this is amazing ! How did you find it ?

In browser's development tools, you have a "Performance tab". I am not sure how to use it in Firefox, but on Chromium :

In Chromium you also have a tab "Lighthouse", that have at the top right "Analyze page load". Note that it doesn't work with "file://" (so you need a local server), and you need to have some content in the page body.

The goal is to have "100%" in the Performance score. With your recent change, it went from "54%" to "58%?" when you added support for defer (from memory), and now it is at 88% with your recent commit (83% without defer). That is a huge increase ^^.

You still have 450ms of blocking time so still have a margin of progress I assume.

Though I use "browser-sync" to have my local server, so it hurts a little the performances.

denis-migdal commented 8 months ago

You also have in Chromium other tabs :

PierreQuentel commented 8 months ago

Though, I am not quite sure to get what you do exactly with unicode (as JS has support for encoding/decoding).

It is used internally in various places, for instance to check valid identifiers as described here, to support code like

# non-ASCII variable names
donnée = 10
машина = 9
ήλιος = 4

assert donnée + машина + ήλιος == 23

# Korean
def 안녕하세요():
    return "hello"

assert 안녕하세요() == "hello"
denis-migdal commented 8 months ago

Though, I am not quite sure to get what you do exactly with unicode (as JS has support for encoding/decoding).

It is used internally in various places, for instance to check valid identifiers as described here, to support code like

Ouch... It seems to be a very inefficient way of doing it.

By default, JS strings are in UTF16, so you should be able to do that very efficiently by verifying if the char code is between given ranges. If you have lot of ranges you may even have some kind of binary search of ranges (maybe in a form of a binary decision tree - that you could even somehow flatten to an array just by playing with the indexes ?). If you have ranges with "holes" you can have some "exception ranges". And you can also detect some patterns with "%", Math.floor(x/Z), etc.

I think you have ways to really speed up this process.

Moreover i think I saw the code for unicode takes ~100kB, so you may even reduce brython.js size by 10% (as well as downloading and JS interpretation time) ?

If you have a different encoding, you can first convert them :

var decoder = new TextDecoder('utf-8'), // utf8 to utf16
decodedMessage = decoder.decode(texte);
var encoder = new TextEncoder('utf-8'), // utf16 to utf8
encodedMessage = encoder.encode(texte);

From that you can also convert it to bytes, Uint8Array etc.

Performance wise :

// code stolen from https://gist.github.com/fabiospampinato/014d15872e2129774ae23783bd377ad2
const content = 'e'.padEnd(2000000, 'z')

const charCodesAt = new Array ( content.length );
console.time('charCodeAt');
for ( let i = 0, l = content.length; i < l; i++ ) {
  charCodesAt[i] = content.charCodeAt ( i );
}
console.timeEnd('charCodeAt');

const encoder = new TextEncoder ();
console.time('TextEncoder');
encoder.encode ( content )
console.timeEnd('TextEncoder');

console.log ( content.length );
denis-migdal commented 8 months ago

It seems you should use Regex :

JS Regex can be unicode aware.

cf https://stackoverflow.com/questions/9862761/how-to-check-if-character-is-a-letter-in-javascript

RegExp(/^\p{L}/,'u').test(str)

See how to use unicode classes : https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

denis-migdal commented 8 months ago

Okay, so performance wise, it seems that :

You can type unicode char in JS : "\u006F" or copy directly the unicode character "o" as it is supported.

PierreQuentel commented 8 months ago

More and more amazing ! I didn't know about this RegExp feature, /\p{Letter}/u. Yes, at first sight it could replace the use of Unicode tables and save quite a few kB in brython.js. Thanks !

PierreQuentel commented 8 months ago

In the commit above I have adapted the scripts that used the Unicode tables wherever I could. unicode_data.js is reduced to 3kB for cases where I couldn't find Javascript solutions (getting the integer value of digits in non-latin alphabets for instance).

Thanks again !

denis-migdal commented 8 months ago

Wow, you jumped from a performance score of 88% last time to 97% !!!! that is starting to be good ^^. You went from 163ms load time to 22ms !!!.

That's... enormous ^^.

I think now a big part of the load performances is due to brython.js size, but I think it can still be reduced by factorizing checks (how much I don't know though), but should also helps compression.

Maybe $B.$make_ext and run_script could still be optimized (for at most a 10% execution speed gain). But I don't think it'll be quite interesting. I think next step would be to accelerate Brython execution time, for that there are generally 3 ways :

I found lot of things to optimize for code size and performances, that'd be quite promising at the end ^^. We'll have to compare the benchmarks at the end xD.

denis-migdal commented 8 months ago

Still thinking about JS <=> Brython conversions...

Benchmarks :

let array2 = array.map( e => { return {obj: e} });

at = (array, idx) => {
    let val = array2[idx];
    if( val !== undefined && val.obj === array[idx])
        return val
    // we should make the lazy conversion here
    // but that wouldn't be fair.
    throw 'not implemented'
};

Imma checks for objects too.

denis-migdal commented 8 months ago

Okay, good news for objects, the overcost seems quite limited (2% slower to 10% slower). The difference seems mainly due to browser runtime optimizations I assume.

So a lazy strategy might not be an issue for objects. I think you already use something for objects, so it shouldn't cost more for objects.