brython-dev / brython

Brython (Browser Python) is an implementation of Python 3 running in the browser
BSD 3-Clause "New" or "Revised" License
6.32k stars 513 forks source link

[OPTIMISATION] Optimisations for DEBUG informations (easily done, big performances) #2270

Open denis-migdal opened 9 months ago

denis-migdal commented 9 months ago

Hi,

I suggest several small optimizations for the debug information that Brython inserts into the JS generated code :

Interesting optimisations

Please also look at

A. Use constants as index to an array, instead of arrays [NO]

Please also look at :

Currently, an array describing the position of the "executed" python code is given as the last argument of underlying JS functions, e.g. :

$B.$call($B.$getattr_pep657(locals___main__.a, 'replace', [0, 0, 9]), [2, 2, 14])('e')

This has the disadvantage of potentially building an array each time the line is executed, and takes more space in the generated JS code.

Therefore, I suggest using globally defined constants, defined during transpilation time. This would increase execution performances while reducing the size of the generated code :

DEBUG_SYMBOLS = { // see suggestion B. for a better efficient way.
    1: [0,0,9],
    2: [2,2,14]
}

The code would then be written :

$B.$call($B.$getattr_pep657(locals___main__.a, 'replace', 1), 2)('e')

This would add an insignificant cost when printing error messages, i.e. to do operation DEBUG_SYMBOLS[the_constant]. But it doesn't matter as :

B. Build the constant as a mathematical expression from the 3 numbers [a,b,c].

In the previous suggestion, we used an array to convert a constant to an array containing the information required for DEBUG. But a JS constant has 53bits exploitable.

This can be split into 3x16 bits, i.e. 3 numbers in the range [0 - 65536]. I'd argue that no python lines should ever be that long (PEP indicates lines should be <79 characters). At transpillation time, we would simply do : code :

DIGIT = [0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F]; //globally defined
NB_BITS = 16; // globally defined

// be [a, b, c] :
CSNT = "0x" + (a << (NB_BITS*2) + b << NB_BITS + c).toString(16);

Giving the following code :

$B.$call($B.$getattr_pep657(locals___main__.a, 'replace', 0x9), 0x2020E)('e')

Then, when handling the error :

a = the_constant >> (NB_BITS*2)
b = (the_constant-a) >> NB_BITS)
c = (the_constant-a-b)

This would be quite quick when handling errors, while taking less RAM memory (no needs for a DEBUG_SYMBOLS structure). Current version takes 7 characters minimum for the [a,b,c], array, this version may takes at most 8 characters, so one more (but generally would be at most 7 characters).

Of course, we can reduce the value of NB_BITS if needed, and even add the line number if we have enough bits. This makes the code more readable as we understand that 0x09 is a special value.

More opti :

With more knowledge about the 3 [a,b,c] values, we can reduce the number of bits. E.g. if a <= b <= c, instead of storing [a,b,c] into 53 bits, we can store [a, b-a, c-a] into 53bits. Then, we can attribute less bits to e.g. b-a and c-a.

Which could liberate bits to store the line numbers if interesting.

C. Replace all the $B.set_lineno(frame, 7); by a map ? (NO)

Instead of keeping track of the line number, which likely kills the execution performances, creating a structure enabling to convert the JS stackstrace into the Python stacktrace (this would also enable better error printing I think).

The structure would be globally defined as :

STACKTRACE_MAP = new Map(); // can be a {} if key is a string.
STACKTRACE_MAP.set(FILE_A, [ 1, 10, 23, 34, 106, .... ]);
STACKTRACE_MAP.set(FILE_B, [ 1, 10, 23, 34, 106, .... ]);

I am not sure about the value we need to put in FILE_A/FILE_B, we'll have to look at JS stacktrace more in depth when raising exceptions inside Brython code. Maybe there are also ways to cheat a little.

Getting the line numbers would then be done as :

let JS_LINENO = .... // extracted from the stacktrace.
let filemap = STACKTRACE_MAP.get(????);

let i = -1;
while( STACKTRACE_MAP[++i] <  JS_LINENO );

let BRYTHON_LINENO = STACKTRACE_MAP[i-1];

When transpiling the Python code, the filemap would be built as :

let offset = 0;
let filemap = new Array( nbLines(py_file) ) // preallocate the array. Can't have more entry than we have py lines;

let nb_lines = 0;
// for each lines :
        js_file += "the content\nor whatever\ntoto"
        nb_lines += 2 // for the sake of the example.
        filemap[offset++] = nb_lines
filemap.length = 0;

This may make transpilation slightly slower, but I'll argue that :

This will also make error printing more slow (but we don't care).

Also, with that, would we still need the frame system ? Is there other lines numbers hidden in the generated JS code ? (e.g. for brython class definition ?)

EDIT: If the generated JS file is minimified, the file numbers will not match anymore. I think the minimifier can also build a sourcemap, that will give you the good lines numbers in the stacktrace. Else, a library might be able to do the trick. Else, I'd argue that if you minimify, you want performances.

D. Option execution-mode="production" to remove debug informations, and make output cleaner/shorter. (NO)

Currently, because Brython doesn't implement sourcemaps, the generated JS code is bloated with arrays, and $B.set_lineno(frame, 7); calls.

This makes the generated code harder to read for a human, increases the generated code size, and has a cost on performances. This is really useful when developing to better understand errors and fix them. However, in production, we might want to go faster and might not be interested by debugging information.

For this usage, sourcemaps are generally used (cf more info about sourcemaps structures), but I can understand it might not be easy to make.

Hence, I suggest to implement an option execution-mode that could take 3 values :

Cordially,

PierreQuentel commented 9 months ago

Thanks for the suggestions.

I don't get the idea for A : the value of DEBUG[1] and DEBUG[2] should change before every call to $getattr_pep657, what would be the benefit ?

I am more convinced by B. We should test how much space and execution time this would bring.

For C and D :

denis-migdal commented 9 months ago

Okay... for the optimization B, the gain in performances will be HUGE.

Optimisation B seems 3.8x faster than current Brython implementation.

Screenshot 2023-10-09 at 22-17-07 Test

... BUT there is an even better way... that is x36 faster. The rational is to transpile the code like that :

$B.LID = 0x000000000007;
$B.IID = 0x000000000003; let R = $B.$getattr_pep657(locals___main__.a, 'replace')
$B.IID = 0x00020002000E; call(R)('e');
// instead of :
// $B.set_lineno(frame, 7);
// $B.$call($B.$getattr_pep657(locals___main__.a, 'replace', [0,0,3]), [2,2,14])('e')

Note it is possible to offer 3 outputs mode :

We use an intermediate variable (R), but the browser is clever and optimize it (so it has no cost). The interesting thing is theses $B.LID (Line ID) and $B.IID (Instruction ID) global variables.

$B.IID (Instruction ID) is global, and can be accessed from anywhere... This has several advantages :

$B.LID (Line ID) is global, and can be accessed from anywhere... This has several advantages :

And for a production output... Just don't include the $B.LID and $B.IID to the generated JS code.

denis-migdal commented 9 months ago

I don't get the idea for A : the value of DEBUG[1] and DEBUG[2] should change before every call to $getattr_pep657, what would be the benefit ?

The idea was to increment this number each time the transpiler needs to write one. But this was not the best solution, better trying the new one I found, (or B if the new one isn't possible).

For C and D :

* the line number is not relevant only for stack trace in case of errors, it is also used in trace functions such as set by `sys.settrace()` (cf. examples in the test of module **`sys`** in the Brython test suite), I don't think we can rely on Javascript internals for that

Ouch... Yeah it hurts...

Possible solutions :

  1. asking the user to explicitly enable it before transpilation if he wants to use it ? As it seems very costly, it may be something we doesn't want to enable by default ? (then in settrace raising an exception settrace is not enabled, add the option XXXX to enable settrace).
  2. a global function :
    function LID_base(frame, lid) {
    $B.LID = lid // if using the last solution. Else, for solution C, just do nothing.
    }
    function LID_trace(frame, lid) {
    // do your stuff
    }
    $B.L = LID_base
    // .....
    $B.L(frame, 7)
    do stuff
    $B.L(frame, 8)

Then, when settrace is called, do $B.L = LID_trace ? I think this would help the browser to inline the function call until settrace is enabled (so trying to reduce the cost when settrace isn't used).

denis-migdal commented 9 months ago

Okay, this night I took a better look at the transpiled code from online Editor. I'll do the PR test stuff in the evening I guess.

1. Use finally (NO)

A small remark : maybe it'll be worth it to use the finally clause : try{} catch(e) {} finally {} :

Indeed, it is often written :

try {
    // do stuff
    $B.leave_frame()
    return result
}catch(err){
      // do stuff
      $B.leave_frame();throw err
}

This can be replaced by :

try {
    // do stuff
    return result
}catch(err){
      // do stuff
      throw err
} finally {
    $B.leave_frame()
}

2. Benchmarks for frame cost [NO]

It seems all the frame stuff is taking the most space (like > 50% of the produced JS ?). I think we really need a version/mode without the frame info, settrace() support, the DEBUG informations/pretty error printing, etc. in order to :

The benchmarks would give us an idea of the costs of settrace() support. If it is at the cost at being, e.g. 40x slower, maybe we can have a big margin of progress and it'll be worth it to optimize. If it is e.g. 1.01x slower, maybe further optimizations isn't worth it.

3. Function declaration

The other thing that takes lot of spaces in the produce code seems to be the function declaration. But nearly half of it is due to frame.

The other thing that takes space is the $info property. Could an helper helps to reduce generated JS code size (while making the output more readable) ? E.g.

foo1382.$infos = $B.build_fct_info("foo", ....) // I think some info are identical for all functions definitions ?

I think this may comes at a price of slower execution time (though the browser might be able to inline the function) ? Though slower function declaration isn't really a matter : it is more the function execution time that matters (we define it once,but execute it multiple time).

4. Moving some operations outside of the function body ?

In each function execution the following is done :

Maybe it is possible to push it all inside a new function $B.execute() in order to :

This costs an extra function call when calling Brython functions. Dunno if browser could better inline functions thanks to that, or if this over cost is really significant. Also by reducing generated JS code size, we mechanically reduce transpilation time.

Possibility 1:

locals_exec.foo = $B.build_executor( foo1382 ) // would build/return a $execute function those 2nd argument is binded to foo1382).

Possibility 2 replace $B.call by:

$B.$execute([0, 0, 7], locals_exec.foo, 32)

I don't know which method would be better if 4. is a good idea, but that is something that could be tested.

$B.execute would then be something like :

$B.execute = function(DEBUG_INFO, fct, ...arguments){

    // handle arguments and other stuffs.
    var locals_exec_foo,
        locals
    locals_exec_foo = locals = $B.args0(fct, arguments)
    var frame = [fct.__name__, locals, "exec", locals_exec, fct]
    if(locals.$has_generators){
      frame.$has_generators = true
    }
    frame.__file__ = '<string>'
    frame.$lineno = 1
    frame.$f_trace = $B.enter_frame(frame)
    try{
      $B.js_this = this

    // enters frame
    $B.enter_frame(frame);
    try {
         let ret = fct(frame, arguments) // or fct(frame, ...arguments) ???
         // handle return type
    } catch {
           // handle exception
    } finally {
          // leave frame
          $B.leave_frame();throw err
    }
}
denis-migdal commented 9 months ago

@PierreQuentel At the top of the issue, I will put the optimization you find interesting. So that'd be easier for you to browse in this issue.

I found an easy way to get a VERY HUGE performance gain in the current system :

Indeed using an array is costly :

Using an uni-directionnal tree would instead be waaay quicker :

let ptr = { previous: null, /* all your frame attribute here */ }

function enter_frame(ptr) {
      // do your stuff
      return {previous: ptr,  /* all your frame attribute here */ }
}
function leave_frame(ptr) {
      // do your stuff
      return ptr.previous
}

// usage :
ptr = enter_frame(ptr);
// frame stuff
ptr = leave_frame(ptr);

// printing the stack :
let cur = ptr
while( cur !== null ) {
     // do your stuff
     cur = ptr.previous
}

Printing the stack would be slower, but we don't care.

PierreQuentel commented 9 months ago

Salut Denis,

With the commit above, encoding and decoding the format information is delegated to functions in ast_to_js.js, so we can test several formats. Note that the number of items can be 4, in the case of binary operations, to correctly trace errors such as

Traceback (most recent call last):
  File "http://localhost:8000/tests/test.html#__main__", line 1, in <module>
    1 / 0
    ~~^~~
ZeroDivisionError: division by zero
denis-migdal commented 9 months ago

EDIT: better idea after : https://github.com/brython-dev/brython/issues/2270#issuecomment-1754841797

Mmm... testing something like that would be interesting (in size of the JS output, transpilation time, and execution time).


// Total: 32bits, assuming A < 256, B-A < 256, C-B < 256, and D < 256.
// At most : position is 2+8 characters : "Ox + 2hex chars for each numbers.
// Each bit added or removed is +/- 1 character (in hex 1 characters = 4 bits).
// We can increase the bits up to 13 (so values up to 8191).
// I think it'll be safe to go down to 7 bits (123). One less bit would be a risk (63)
// [a,b,c,d] is at least 9 characters. 7 bits is at most 9 characters (but usually, d = 0/undefined, therefore in reality would very often be 7characters, or even 5-6 when b=a, which is also very often the case ).
const NB_BITS_A = 8;
const NB_BITS_B = 8;
const NB_BITS_C = 8;
const NB_BITS_D = 8;

const MASK_A = (1 << NB_BITS_A ) - 1
const MASK_B = (1 << NB_BITS_B ) - 1
const MASK_C = (1 << NB_BITS_C ) - 1
const MASK_D = (1 << NB_BITS_D ) - 1

// we assume d is more likely to be = 0, then b, then a. c is the least likely to be 0.
const OFFSET_D = OFFSET_B + NB_BITS_B;
const OFFSET_B = OFFSET_A + NB_BITS_A;
const OFFSET_A = NB_BITS_C;
const OFFSET_C = 0;

function encode_position(a, b, c, d = 0){
    // assuming a <= b <= c (if we had more info about "d" we could do something with it) :
    c -= b;
    b -= a;

    return "0x" + ( (d << OFFSET_D) + (b << OFFSET_B) + (a << OFFSET_A) + (c << OFFSET_C) ).toString(16); // condensed format
}

$B.decode_position = function(pos){
    let d = (pos >> OFFSET_D) & MASK_D
    let b = (pos >> OFFSET_B) & MASK_B
    let a = (pos >> OFFSET_A) & MASK_A
    let c = (pos >> OFFSET_C) & MASK_C

     // assuming a <= b <= c (we do the reverse of what we had done in encode).
    b += a
    c += b

    return [a,b,c,d]
    // or if d = 0 isn't supported :
    if( d === 0)
        return [a,b,c]
    else
        return [a,b,c,d]
}

123 chars would correspond to :

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Which seems a lot for an indentation (start of the "error" ?) as well as for a function name.

denis-migdal commented 9 months ago

EDIT: better idea after : https://github.com/brython-dev/brython/issues/2270#issuecomment-1754841797

The issue of my previous proposition is that :

Another representation possible, is using the last digit (4bits) to code the number of bits taken by each value (0->F => 0 to 15bits). (We could also use 3bits instead of 4 then 0->7 => 3 to 10bits per values). This would solve the 2 issues of the previous proposition. I think this may be one of the most optimal strat memory wise, but would require some computations to find the number of bits to use for each values. This process is a little hard to explain.

But I think I have an simplier strat :

Be (a1, a2, a3), (b1, b2, b3), (c1,c2,c3), (d1,d2,d3) the bits of a, b, c, d, we store pos as the bits : (a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3).

The order of bits doesn't matter : (a1,b1,c1,d1) is 4 bits, so 1 character. As long as one of them is not-null, one char will be taken. This means that the size of the pos is 2 (the "0x" + log2( max(a,b,c,d) ).

i.e. most of the time not more than 2+6 characters (=8 characters). (values up to 64). Can easily go to the max (13bits), i.e. values up to 8192.

function encode_position(a, b, c, d = 0){

    // assuming a <= b <= c (if we had more info about "d" we could do something with it) :
    c -= b;
    b -= a;

    let pos = 0;
    let offset = -1    

    while( a+b+c+d !== 0) {
         pos |= (a & 1) << ++offset;
         a >>= 1

         pos |= (b & 1) << ++offset;
         b >>= 1

         pos |= (c & 1) << ++offset;
         c >>= 1

         pos |= (d & 1) << ++offset;
         d >>= 1
    }

    return "0x" + pos.toString(16); // condensed format
}

$B.decode_position = function(pos){

    let pos = 0;
    let offset = 0    

    while( pos !== 0) {

         a |= (pos & 1) << offset;
         pos >>= 1

         b |= (pos & 1) << offset;
         pos >>= 1

         c |= (pos & 1) << offset;
         pos >>= 1

         d |= (pos & 1) << offset;
         pos >>= 1

         ++offset
    }

     // assuming a <= b <= c (we do the reverse of what we had done in encode).
    b += a
    c += b

    return [a,b,c,d]
    // or if d = 0 isn't supported :
    if( d === 0)
        return [a,b,c]
    else
        return [a,b,c,d]
}
denis-migdal commented 9 months ago

Okay, let's do better :

Here what the code should looks like (not tested) :

function encode_position(a, b, c, d = c){

    // assuming a <= b <= c <= d (if we had more info about "d" we could do something with it) :
    d -= c;
    c -= b;
    b -= a;

    let nbbits = Math.log2(a | b | c | d);

    let pos = 1 << nbbits; // our flag.

    has_d = +(d !== 0)
    has_b = +(c !== 0)

    let indic = has_b + (has_d << 2)

    // multiplication prevents condition ? - should be faster ?
    pos |= d
    pos <<= nbbits * has_d

    pos |= b
    pos <<= nbbits * has_b

    pos <<= (nbbits+2)

    pos |= c << (nbbits+2)
    pos |= a << 2
    pos |= indic

    return "0x" + pos.toString(16); // condensed format
}

$B.decode_position = function(pos){

    let has_b = pos & 0x1
    let has_d = (pos & 0x2) >> 1
    let nb_values = 2 + has_b + has_d
    pos >>= 2

    let nbbits = Math.log2( pos ) / nb_values //TODO verify value...
    let mask = (1 << nbbits)-1
    a = pos & mask
    pos >>= nbbits
    c = pos & mask
    pos >>= nbbits

    // multiplication prevents condition ? - should be faster ?
    b = (pos & mask) * has_b
    pos >>= nbbits * has_b

    d = (pos & mask) * has_d
    pos >>=nbbits * has_d

    if( has_d !== true )
         return [a,b,c]
    return [a,b,c,d];
}
PierreQuentel commented 9 months ago

About finally in try/catch blocks: it was used in early versions of Brython, but I removed it because of performance issues. They are less important now, but still around 10% on Chrome between these versions (g is faster than f):

function f(){
    try{
        x
    }catch(err){
        err.message
    }finally{
        3 + 4
    }
}

function g(){
    try{
        x
        3 + 4
    }catch(err){
        err.message
        3 + 4
    }
}
g()

The link you put on "it doesn't add any overcost." seems broken.

PierreQuentel commented 9 months ago

About "2. Benchmarks for frame cost": we really can't do without frames management. It is not used only to handle exceptions: if you remove it, built-in functions like exec(), eval(), globals() or locals() won't work, neither will super() or del <name>, etc.

PierreQuentel commented 9 months ago

More generally: one of the design principles in Brython is that it should be as close to Python as possible; between performance and compliance with the language definition, I always favour compliance.

This is why I am opposed to options at the user's choice to determine which parts of Brython should or should not work as expected by a Python developer. It's too difficult to explain (for us) and to remember (for users) which options should unplug which feature, and it would make debugging difficult if unplugging a feature has side-effects we had not thought of.

PierreQuentel commented 9 months ago

About the replacement of position information: are you sure there will be such an increase in performance ?

I tried 2 different versions, one with an array and one with an hex, there is no difference on Firefox or Chrome:

N = 10000000
function ignore(){}

var t0 = Date.now()
for(var i=0; i<N; i++){
    ignore([1,2,3])
}
console.log(Date.now() - t0)

var t0 = Date.now()
for(var i=0; i<N; i++){
    ignore(0x999)
}
console.log(Date.now() - t0)

Also tested on jsPerf, with even 0.3% improvement with arrays...

denis-migdal commented 9 months ago

About finally in try/catch blocks: it was used in early versions of Brython, but I removed it because of performance issues. They are less important now, but still around 10% on Chrome between these versions (g is faster than f): Hmm, strange, I do have a small differences in perfs, but not at this point.

The link you put on "it doesn't add any overcost." seems broken. Mmm... it's strange. Here a new link : https://jsperf.app/wuxivi

Well, it seems finally isn't worthing it then.

denis-migdal commented 9 months ago

Thanks for your answser.

About the replacement of position information: are you sure there will be such an increase in performance ?

Yes : https://jsperf.app/mitogo In this jsperf : arrays are x6 slower on FF, and x6.4 slower on Chromium.

Ofc, the increase is only for this part of code, so you won't have an overall x6 speed increase.

I tried 2 different versions, one with an array and one with an hex, there is no difference on Firefox or Chrome:

Your function ignore() is empty and the browser is noticing it. So the function is likely inlined, i.e. the loop is likely doing nothing ? However, when the browser sees that the value is likely to be used, and has no insurance that it'll never be modified (cf my jsperf above), it is forced to create the array at each iteration (though it might still have some optimizations under the hood).

Also tested on jsPerf, with even 0.3% improvement with arrays... 0.3% is in the margin of error, so we could say it performs the same. Which makes me think the ignore() function was indeed inlined.

Else, on "stylish"/"readability" considerations, if it doesn't cause performances reductions (ofc), wouldn't it be better to put the instruction ID as first parameter (but I guess putting it last is better as you can do --arguments.length) ?

$B.$call([2,2,14], $B.$getattr_pep657([0,0,3], locals___main__.a, 'replace') )('e')
// or
$B.$call(0x22E, $B.$getattr_pep657(0x3, locals___main__.a, 'replace') )('e')

I'm picking at straws here, but it is easier to visually associate the position to the called function (and if many lines, align the code better). In current implementation the code for $B.$call is quite far away, at the total opposite of the line : $B.$call($B.$getattr_pep657(locals_main.a, 'replace', [0, 0, 9]), [2, 2, 14])('e')

But yeah, I'm really picking at straws here xD.

I guess the only optimization we could do on the transpiler production are :

(*) x.foo() is converted into :

$B.$call($B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])()

Which seems strange to me as I'd expect to see either :

$B.$call(locals_exec.x, $B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])()
// or
$B.$call($B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])(locals_exec.x)

Making me think that $B.$getattr_pep657 is building a new anonymous function when trying to access to the property ?

return function(obj, attr) {
     let fct = obj[attr]
     // some checks
     return function(args) {  fct.call(obj, args)  }
}

Then there could be ways to optimize it a little I guess.

I'll try to hide messages that are not relevant anymore to keep this thread clean.

PierreQuentel commented 9 months ago

Your test https://jsperf.app/mitogo modifies a global variable, which is never the case in Brython code (or unwillingly...).

If you set a local variable, there is no difference, though it's likely that the data is initialized : https://jsperf.app/xofeta.

denis-migdal commented 9 months ago

Your test https://jsperf.app/mitogo modifies a global variable, which is never the case in Brython code (or unwillingly...).

This isn't a global variable (it isn't in window) : it is a local variable.

If you set a local variable, there is no difference, though it's likely that the data is initialized : https://jsperf.app/xofeta.

The browser sees that the variable is assigned but never used.

The browser can be very tricky on small part of codes. I'll check again in few days.

denis-migdal commented 9 months ago

I made a test to prevent any agressive Browser optimisation, the conclusions are :

The explanation is quite simple: the array needs to be rebuilt at each iterations which is costly and require dynamic memory allocation. Whereas the number is constant and can be stored on the stack. Using arrays takes also increase memory usage if lots of functions are called without interruptions, as the garbage collector might not have the time to pass and liberate the memory.

- Test 1 : https://jsperf.app/xofeta/2

Firefox: Screenshot 2023-10-11 at 17-43-03 Array vs hex vs number (v2)

Chromium: (WTF????) Capture d’écran_2023-10-11_17-44-00

- Test 2 : doing nothing but a check ( https://jsperf.app/xofeta/3 )

Screenshot 2023-10-11 at 17-48-09 https __jsperf app Capture d’écran_2023-10-11_17-56-06

PierreQuentel commented 9 months ago

Salut Denis,

I found an easy way to get a VERY HUGE performance gain in the current system : for the frame/stack : use an uni-directionnal tree instead of a stack.

Thanks for the suggestion. I have implemented it in commit 44ba0d16823857b10ad17e6d4d89bc3712dfc07c (many things to change in many places !). The performance gain is not huge; test results vary from one run to another, but there is nothing really noticeable. If you have tools to compare this version to the previous one it will help.

PierreQuentel commented 9 months ago

Could you also test various ways of encoding position info (compared to the current implementation with an array) and see if it actually improves the overall performance ? Not only unit tests as you did, but tests with complete Brython code, as in the integrated speed measurements made by _speed/makereport.html.

denis-migdal commented 9 months ago

Hi,

I'll have to take some time to write my answer as there are lot to say and some tricky things to explain.

Else, during this week, I worked on the proposal for asynchronous import, and on its redaction. It is almost done, and I think I'll post it during this WE.

denis-migdal commented 9 months ago

Optimisation in Brython is quite tricky.

Not only we have the constrainsts to mimic CPython, thus requiring some "costly" way of doing things... but we are also in JS, on a Browser, which is quite tricky... and I discover new nuances each days.

denis-migdal commented 9 months ago

Maybe you can also point me to some Brython scripts that you think are quite slow to execute (e.g. compared to CPython), and I'll investigate where the cost is coming from ?

denis-migdal commented 9 months ago

as in the integrated speed measurements made by _speed/makereport.html.

It seems the directory www/tests/ace/ is missing.

PierreQuentel commented 9 months ago

Correct, I have removed the Ace scripts from make_report.html.

denis-migdal commented 9 months ago

I'll have to get a deeper look on the documentation.

I'm downloading the Brython archive from the git, I set a webserver on www, but I get some JS errors on www/speed and www/speed/make_report.html.

PierreQuentel commented 9 months ago

Things are improving, thanks to your suggestions in the past days (encore merci !) and an optimization in the creation of class instances (commit cbaccc078cf1a0e8714c5986aabab8839b1b1601). For the first time I think, none of the tests in make_report.html is more than 10 times slower than CPython !

I have to confess, I didn't expected Brython to do so much costly things, often due to the fact we need to mimic CPython behavior.

Welcome to my world ;-)

Maybe you can also point me to some Brython scripts that you think are quite slow to execute (e.g. compared to CPython), and I'll investigate where the cost is coming from

As you noticed, parsing function arguments is complex because of the flexibility offered by Python. I see you had ideas to improve the algorithm, if you can focus on this topic and provide speed improvements, even limited, it would be great.

The regular expression module (in libs/python_re.js) is also much slower than the re module in CPython, but the code is probably very hard to follow...

PierreQuentel commented 9 months ago

I'm downloading the Brython archive from the git, I set a webserver on www, but I get some JS errors on www/speed and www/speed/make_report.html.

You must run the script server.py at the root of the Brython repository, it is designed to handle the Ajax requests sent by make_report.html

denis-migdal commented 9 months ago

You must run the script server.py at the root of the Brython repository, it is designed to handle the Ajax requests sent by make_report.html

Hum, the following error is printed when executing server.py :

Traceback (most recent call last):
  File "/tmp/brython-master/server.py", line 243, in <module>
    exec(make_doc)
  File "<string>", line 38, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/brython-master/www/static_doc/3.12'

Things are improving, thanks to your suggestions in the past days (encore merci !) and an optimization in the creation of class instances (commit https://github.com/brython-dev/brython/commit/cbaccc078cf1a0e8714c5986aabab8839b1b1601). For the first time I think, none of the tests in make_report.html is more than 10 times slower than CPython !

I'm happy to hear that ^^ The last optimisations I suggested seemed unfortunately quite disappointing.

As you noticed, parsing function arguments is complex because of the flexibility offered by Python. I see you had ideas to improve the algorithm, if you can focus on this topic and provide speed improvements, even limited, it would be great.

It seems that even if Python have some flexibility, there are still some rules we can exploit to speed up the process.

I think the process would be to rewrite it from scratch, and start with the more easy case (no arguments), and to benchmark it. Add a new case (e.g. "normal arguments"), and re-benchmark it on the 2 cases to see if a significative cost is added on the first case, and so one. That'd enable to see when the addition of a new parameter would "costs" a lot, and if we need to split stuff.

In order to test that, I'll have to find a way to extract the function in order to execute it "alone" while not having the whole function call. That'd improve reproducibility and would make JS test easier. Maybe I should create a small git in order to do the benchmarks and adaptations little by little.

The regular expression module (in libs/python_re.js) is also much slower than the re module in CPython, but the code is probably very hard to follow...

Aren't CPython regular expression the "same" as JS regular expressions ?

PierreQuentel commented 9 months ago

The bug when starting server.py should be fixed by the commit above.

Aren't CPython regular expression the "same" as JS regular expressions ?

How I wish it would be the case ! But no, the Python engine has a slightly different syntax, and features that the JS engine doesn't have... See this page for the ever-evolving list of differences. [EDIT] this page is 7 years old, the differences have changed a lot since then

As always, the JS engine is available in Brython as window.RegExp, there is an example in the demo page.

denis-migdal commented 9 months ago

Humm.....

I'd like to test argument parsing like that :

  1. I declare my function in Brython, so that I can modify it without bothering copy/pasting things from Brython's editor.
from browser import window

            def run():
                print('ok')

            window.run = run
  1. I'm now able to construct mock call to test different algorithms :
setTimeout( () => {

                function test() {

                    let foo = __BRYTHON__.jsobj2pyobj(window.run);
                    console.log(foo);
                    console.log( __BRYTHON__.args0( foo, arguments ) );
                }

                test('ok')

            }, 1000)

The issue is that jsobj2pyobj() doesn't give me back the original run() function and the constructed object seems to lack some data in $infos.

denis-migdal commented 9 months ago

The bug when starting server.py should be fixed by the commit above.

Thanks

How I wish it would be the case ! But no, the Python engine has a slightly different syntax, and features that the JS engine doesn't have... See this page for the ever-evolving list of differences. [EDIT] this page is 7 years old, the differences have changed a lot since then

Yeah so it's 99% the same, but as always it's the 1% we almost never use that fucks up everything xD.

So the solution is either to have some "conversion" from Re regex to JS regex, which may be a little hard to manage (and introduce an additional cost), or to implement the whole Re regex in JS...

Yeah that's fucked up whatever you choose. xD

denis-migdal commented 9 months ago

Yeah, 4k lines for re...

I could take a look to what takes the most time... but I'm afraid that wouldn't do miracles...

I assume python "re" is implemented in C++ (or something like that) in some python implementation. If you can somehow compile it to WASM, maybe you could get the full implementation while having some perf increases ???

PierreQuentel commented 9 months ago

I assume python "re" is implemented in C++ (or something like that) in some python implementation. If you can somehow compile it to WASM, maybe you could get the full implementation while having some perf increases ???

It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript.

denis-migdal commented 9 months ago

It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript.

For the current implementation, did you use this file that you rewrote by hands, or did you implement your own regex engine from scratch ?

As it is in C, there is almost no way a JS version would be faster than CPython.

To convert it to wasm, I think there are tools like emscriptem. But tbf I am not sure how the results would be.

PierreQuentel commented 9 months ago

Yes, I wrote everything by hand. Adapting the C code (as I did for other scripts, libs/maths.js for instance) was too difficult.

denis-migdal commented 9 months ago

I see, you are quite courageous xD.

Then it may be that you have some algorithm issue, or something of the kind. Testing emscriptem to build the WASM may be interesting to do. I'll see if I can build a student project on that xD.

Else, could it be possible to get jsobj2pyobj(window.run) returning the original brython function ? That'd help me quite a bit to test arguments parsing.

PierreQuentel commented 9 months ago

Else, could it be possible to get jsobj2pyobj(window.run) returning the original brython function ? That'd help me quite a bit to test arguments parsing.

If it's only for your tests, the best is to hack the function to make it return what you need

denis-migdal commented 9 months ago

If it's only for your tests, the best is to hack the function to make it return what you need

I think it'd be best practices that :

let Y = jsobj2pyobj( pyobj2jsObj( X ) ) 
X === Y # they are the same object.

I think such symmetry might prevents issues.

Else, I found a little workaround :

  from browser import window

  def run(i):
      print('ok')

  def getRun():
      window.run = run

  window.getRun = getRun
let orig = __BRYTHON__.pyobj2jsobj;
  __BRYTHON__.pyobj2jsobj = (args) => {
      let ret = orig(args);

      ret.$orig = args;

      __BRYTHON__.pyobj2jsobj = orig;

      return ret;
  }

  window.getRun();
  let foo = window.run.$orig;
  console.log('run is', foo);

  console.log( __BRYTHON__.args0( foo, ["ok"] ) );

}, 1000)

So now I can test it a little better (maybe more tomorrow). Maybe more during the WE. I think I'll open a git specifically for arguments parsing it order to be able to revert and do other things.

denis-migdal commented 9 months ago

I still have some JS errors related to ""ace" on the speed pages.

I think I'll start to do some micro optimizations on the files (array preallocation, using const/let instead of some var, change some algorithms, etc). I won't test for the speed as it'd be likely insignificant most of the time (with sometimes a nice surprise in perfs :) ). I will favor code readability, i.e. I won't touch suboptimal loops like for X in Y and for X of Y (using for(let i = 0; i < Y.length; ++i) is in fact faster), unless you want me to replace them also (but I am not sure the gain in perf is worth it). I think I'll replace the .forEach() though.

The goal would be to clean the code a little, reduce a little memory operations (e.g. when preallocating arrays) and helping browser optimizations (e.g. using const when it applies).

There are ~50k lines to look at, so I'll do it little by little.

denis-migdal commented 9 months ago

It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript.

Found an interesting link for it : https://openhome.cc/eGossip/WebAssembly/C2Wasm.html

Gotta check on it one day.