Open denis-migdal opened 1 year ago
Thanks for the suggestions.
I don't get the idea for A : the value of DEBUG[1] and DEBUG[2] should change before every call to $getattr_pep657
, what would be the benefit ?
I am more convinced by B. We should test how much space and execution time this would bring.
For C and D :
sys.settrace()
(cf. examples in the test of module sys
in the Brython test suite), I don't think we can rely on Javascript internals for thatglobals()
and locals()
, etc.Okay... for the optimization B, the gain in performances will be HUGE.
Optimisation B seems 3.8x faster than current Brython implementation.
... BUT there is an even better way... that is x36 faster. The rational is to transpile the code like that :
$B.LID = 0x000000000007;
$B.IID = 0x000000000003; let R = $B.$getattr_pep657(locals___main__.a, 'replace')
$B.IID = 0x00020002000E; call(R)('e');
// instead of :
// $B.set_lineno(frame, 7);
// $B.$call($B.$getattr_pep657(locals___main__.a, 'replace', [0,0,3]), [2,2,14])('e')
Note it is possible to offer 3 outputs mode :
We use an intermediate variable (R), but the browser is clever and optimize it (so it has no cost).
The interesting thing is theses $B.LID
(Line ID) and $B.IID
(Instruction ID) global variables.
$B.IID (Instruction ID) is global, and can be accessed from anywhere... This has several advantages :
$B.LID (Line ID) is global, and can be accessed from anywhere... This has several advantages :
And for a production output... Just don't include the $B.LID
and $B.IID
to the generated JS code.
I don't get the idea for A : the value of DEBUG[1] and DEBUG[2] should change before every call to
$getattr_pep657
, what would be the benefit ?
The idea was to increment this number each time the transpiler needs to write one. But this was not the best solution, better trying the new one I found, (or B if the new one isn't possible).
For C and D :
* the line number is not relevant only for stack trace in case of errors, it is also used in trace functions such as set by `sys.settrace()` (cf. examples in the test of module **`sys`** in the Brython test suite), I don't think we can rely on Javascript internals for that
Ouch... Yeah it hurts...
Possible solutions :
settrace
raising an exception settrace is not enabled, add the option XXXX to enable settrace
).function LID_base(frame, lid) {
$B.LID = lid // if using the last solution. Else, for solution C, just do nothing.
}
function LID_trace(frame, lid) {
// do your stuff
}
$B.L = LID_base
// .....
$B.L(frame, 7)
do stuff
$B.L(frame, 8)
Then, when settrace
is called, do $B.L = LID_trace
?
I think this would help the browser to inline the function call until settrace
is enabled (so trying to reduce the cost when settrace
isn't used).
Okay, this night I took a better look at the transpiled code from online Editor. I'll do the PR test stuff in the evening I guess.
A small remark : maybe it'll be worth it to use the finally
clause : try{} catch(e) {} finally {}
:
finally
will always be executed (even if throwing exceptions inside catch
, returning values, etc).Indeed, it is often written :
try {
// do stuff
$B.leave_frame()
return result
}catch(err){
// do stuff
$B.leave_frame();throw err
}
This can be replaced by :
try {
// do stuff
return result
}catch(err){
// do stuff
throw err
} finally {
$B.leave_frame()
}
It seems all the frame
stuff is taking the most space (like > 50% of the produced JS ?). I think we really need a version/mode without the frame
info, settrace()
support, the DEBUG informations/pretty error printing, etc. in order to :
The benchmarks would give us an idea of the costs of settrace()
support. If it is at the cost at being, e.g. 40x slower, maybe we can have a big margin of progress and it'll be worth it to optimize. If it is e.g. 1.01x slower, maybe further optimizations isn't worth it.
The other thing that takes lot of spaces in the produce code seems to be the function declaration. But nearly half of it is due to frame
.
The other thing that takes space is the $info
property. Could an helper helps to reduce generated JS code size (while making the output more readable) ? E.g.
foo1382.$infos = $B.build_fct_info("foo", ....) // I think some info are identical for all functions definitions ?
I think this may comes at a price of slower execution time (though the browser might be able to inline the function) ? Though slower function declaration isn't really a matter : it is more the function execution time that matters (we define it once,but execute it multiple time).
In each function execution the following is done :
Maybe it is possible to push it all inside a new function $B.execute()
in order to :
settrace
(just by changing the value of $B.execute
) ?This costs an extra function call when calling Brython functions. Dunno if browser could better inline functions thanks to that, or if this over cost is really significant. Also by reducing generated JS code size, we mechanically reduce transpilation time.
Possibility 1:
locals_exec.foo = $B.build_executor( foo1382 ) // would build/return a $execute function those 2nd argument is binded to foo1382).
Possibility 2 replace $B.call
by:
$B.$execute([0, 0, 7], locals_exec.foo, 32)
I don't know which method would be better if 4. is a good idea, but that is something that could be tested.
$B.execute
would then be something like :
$B.execute = function(DEBUG_INFO, fct, ...arguments){
// handle arguments and other stuffs.
var locals_exec_foo,
locals
locals_exec_foo = locals = $B.args0(fct, arguments)
var frame = [fct.__name__, locals, "exec", locals_exec, fct]
if(locals.$has_generators){
frame.$has_generators = true
}
frame.__file__ = '<string>'
frame.$lineno = 1
frame.$f_trace = $B.enter_frame(frame)
try{
$B.js_this = this
// enters frame
$B.enter_frame(frame);
try {
let ret = fct(frame, arguments) // or fct(frame, ...arguments) ???
// handle return type
} catch {
// handle exception
} finally {
// leave frame
$B.leave_frame();throw err
}
}
@PierreQuentel At the top of the issue, I will put the optimization you find interesting. So that'd be easier for you to browse in this issue.
I found an easy way to get a VERY HUGE performance gain in the current system :
Indeed using an array is costly :
$B.frames_stack.slice()
quite often).Using an uni-directionnal tree would instead be waaay quicker :
let ptr = { previous: null, /* all your frame attribute here */ }
function enter_frame(ptr) {
// do your stuff
return {previous: ptr, /* all your frame attribute here */ }
}
function leave_frame(ptr) {
// do your stuff
return ptr.previous
}
// usage :
ptr = enter_frame(ptr);
// frame stuff
ptr = leave_frame(ptr);
// printing the stack :
let cur = ptr
while( cur !== null ) {
// do your stuff
cur = ptr.previous
}
Printing the stack would be slower, but we don't care.
Salut Denis,
With the commit above, encoding and decoding the format information is delegated to functions in ast_to_js.js
, so we can test several formats. Note that the number of items can be 4, in the case of binary operations, to correctly trace errors such as
Traceback (most recent call last):
File "http://localhost:8000/tests/test.html#__main__", line 1, in <module>
1 / 0
~~^~~
ZeroDivisionError: division by zero
EDIT: better idea after : https://github.com/brython-dev/brython/issues/2270#issuecomment-1754841797
Mmm... testing something like that would be interesting (in size of the JS output, transpilation time, and execution time).
// Total: 32bits, assuming A < 256, B-A < 256, C-B < 256, and D < 256.
// At most : position is 2+8 characters : "Ox + 2hex chars for each numbers.
// Each bit added or removed is +/- 1 character (in hex 1 characters = 4 bits).
// We can increase the bits up to 13 (so values up to 8191).
// I think it'll be safe to go down to 7 bits (123). One less bit would be a risk (63)
// [a,b,c,d] is at least 9 characters. 7 bits is at most 9 characters (but usually, d = 0/undefined, therefore in reality would very often be 7characters, or even 5-6 when b=a, which is also very often the case ).
const NB_BITS_A = 8;
const NB_BITS_B = 8;
const NB_BITS_C = 8;
const NB_BITS_D = 8;
const MASK_A = (1 << NB_BITS_A ) - 1
const MASK_B = (1 << NB_BITS_B ) - 1
const MASK_C = (1 << NB_BITS_C ) - 1
const MASK_D = (1 << NB_BITS_D ) - 1
// we assume d is more likely to be = 0, then b, then a. c is the least likely to be 0.
const OFFSET_D = OFFSET_B + NB_BITS_B;
const OFFSET_B = OFFSET_A + NB_BITS_A;
const OFFSET_A = NB_BITS_C;
const OFFSET_C = 0;
function encode_position(a, b, c, d = 0){
// assuming a <= b <= c (if we had more info about "d" we could do something with it) :
c -= b;
b -= a;
return "0x" + ( (d << OFFSET_D) + (b << OFFSET_B) + (a << OFFSET_A) + (c << OFFSET_C) ).toString(16); // condensed format
}
$B.decode_position = function(pos){
let d = (pos >> OFFSET_D) & MASK_D
let b = (pos >> OFFSET_B) & MASK_B
let a = (pos >> OFFSET_A) & MASK_A
let c = (pos >> OFFSET_C) & MASK_C
// assuming a <= b <= c (we do the reverse of what we had done in encode).
b += a
c += b
return [a,b,c,d]
// or if d = 0 isn't supported :
if( d === 0)
return [a,b,c]
else
return [a,b,c,d]
}
123 chars would correspond to :
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Which seems a lot for an indentation (start of the "error" ?) as well as for a function name.
EDIT: better idea after : https://github.com/brython-dev/brython/issues/2270#issuecomment-1754841797
The issue of my previous proposition is that :
Another representation possible, is using the last digit (4bits) to code the number of bits taken by each value (0->F => 0 to 15bits). (We could also use 3bits instead of 4 then 0->7 => 3 to 10bits per values). This would solve the 2 issues of the previous proposition. I think this may be one of the most optimal strat memory wise, but would require some computations to find the number of bits to use for each values. This process is a little hard to explain.
Be (a1, a2, a3), (b1, b2, b3), (c1,c2,c3), (d1,d2,d3) the bits of a, b, c, d, we store pos as the bits :
(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3)
.
The order of bits doesn't matter : (a1,b1,c1,d1) is 4 bits, so 1 character. As long as one of them is not-null, one char will be taken. This means that the size of the pos is 2 (the "0x" + log2( max(a,b,c,d) ).
i.e. most of the time not more than 2+6 characters (=8 characters). (values up to 64). Can easily go to the max (13bits), i.e. values up to 8192.
function encode_position(a, b, c, d = 0){
// assuming a <= b <= c (if we had more info about "d" we could do something with it) :
c -= b;
b -= a;
let pos = 0;
let offset = -1
while( a+b+c+d !== 0) {
pos |= (a & 1) << ++offset;
a >>= 1
pos |= (b & 1) << ++offset;
b >>= 1
pos |= (c & 1) << ++offset;
c >>= 1
pos |= (d & 1) << ++offset;
d >>= 1
}
return "0x" + pos.toString(16); // condensed format
}
$B.decode_position = function(pos){
let pos = 0;
let offset = 0
while( pos !== 0) {
a |= (pos & 1) << offset;
pos >>= 1
b |= (pos & 1) << offset;
pos >>= 1
c |= (pos & 1) << offset;
pos >>= 1
d |= (pos & 1) << offset;
pos >>= 1
++offset
}
// assuming a <= b <= c (we do the reverse of what we had done in encode).
b += a
c += b
return [a,b,c,d]
// or if d = 0 isn't supported :
if( d === 0)
return [a,b,c]
else
return [a,b,c,d]
}
Okay, let's do better :
nb_bits = log2( a | b | c | d)*4
, so to indicate it, we'll just set the nb+1+2-th
(starts at 0) bit to "1" (i.e. we are able to store the number of bits we use in only 1bit of information !).a==b
.Here what the code should looks like (not tested) :
function encode_position(a, b, c, d = c){
// assuming a <= b <= c <= d (if we had more info about "d" we could do something with it) :
d -= c;
c -= b;
b -= a;
let nbbits = Math.log2(a | b | c | d);
let pos = 1 << nbbits; // our flag.
has_d = +(d !== 0)
has_b = +(c !== 0)
let indic = has_b + (has_d << 2)
// multiplication prevents condition ? - should be faster ?
pos |= d
pos <<= nbbits * has_d
pos |= b
pos <<= nbbits * has_b
pos <<= (nbbits+2)
pos |= c << (nbbits+2)
pos |= a << 2
pos |= indic
return "0x" + pos.toString(16); // condensed format
}
$B.decode_position = function(pos){
let has_b = pos & 0x1
let has_d = (pos & 0x2) >> 1
let nb_values = 2 + has_b + has_d
pos >>= 2
let nbbits = Math.log2( pos ) / nb_values //TODO verify value...
let mask = (1 << nbbits)-1
a = pos & mask
pos >>= nbbits
c = pos & mask
pos >>= nbbits
// multiplication prevents condition ? - should be faster ?
b = (pos & mask) * has_b
pos >>= nbbits * has_b
d = (pos & mask) * has_d
pos >>=nbbits * has_d
if( has_d !== true )
return [a,b,c]
return [a,b,c,d];
}
About finally
in try/catch blocks: it was used in early versions of Brython, but I removed it because of performance issues. They are less important now, but still around 10% on Chrome between these versions (g
is faster than f
):
function f(){
try{
x
}catch(err){
err.message
}finally{
3 + 4
}
}
function g(){
try{
x
3 + 4
}catch(err){
err.message
3 + 4
}
}
g()
The link you put on "it doesn't add any overcost." seems broken.
About "2. Benchmarks for frame cost": we really can't do without frames management. It is not used only to handle exceptions: if you remove it, built-in functions like exec()
, eval()
, globals()
or locals()
won't work, neither will super()
or del <name>
, etc.
More generally: one of the design principles in Brython is that it should be as close to Python as possible; between performance and compliance with the language definition, I always favour compliance.
This is why I am opposed to options at the user's choice to determine which parts of Brython should or should not work as expected by a Python developer. It's too difficult to explain (for us) and to remember (for users) which options should unplug which feature, and it would make debugging difficult if unplugging a feature has side-effects we had not thought of.
About the replacement of position information: are you sure there will be such an increase in performance ?
I tried 2 different versions, one with an array and one with an hex, there is no difference on Firefox or Chrome:
N = 10000000
function ignore(){}
var t0 = Date.now()
for(var i=0; i<N; i++){
ignore([1,2,3])
}
console.log(Date.now() - t0)
var t0 = Date.now()
for(var i=0; i<N; i++){
ignore(0x999)
}
console.log(Date.now() - t0)
Also tested on jsPerf, with even 0.3% improvement with arrays...
About
finally
in try/catch blocks: it was used in early versions of Brython, but I removed it because of performance issues. They are less important now, but still around 10% on Chrome between these versions (g
is faster thanf
): Hmm, strange, I do have a small differences in perfs, but not at this point.The link you put on "it doesn't add any overcost." seems broken. Mmm... it's strange. Here a new link : https://jsperf.app/wuxivi
Well, it seems finally
isn't worthing it then.
Thanks for your answser.
About the replacement of position information: are you sure there will be such an increase in performance ?
Yes : https://jsperf.app/mitogo In this jsperf : arrays are x6 slower on FF, and x6.4 slower on Chromium.
Ofc, the increase is only for this part of code, so you won't have an overall x6 speed increase.
I tried 2 different versions, one with an array and one with an hex, there is no difference on Firefox or Chrome:
Your function ignore()
is empty and the browser is noticing it. So the function is likely inlined, i.e. the loop is likely doing nothing ?
However, when the browser sees that the value is likely to be used, and has no insurance that it'll never be modified (cf my jsperf above), it is forced to create the array at each iteration (though it might still have some optimizations under the hood).
Also tested on jsPerf, with even 0.3% improvement with arrays... 0.3% is in the margin of error, so we could say it performs the same. Which makes me think the
ignore()
function was indeed inlined.
Else, on "stylish"/"readability" considerations, if it doesn't cause performances reductions (ofc), wouldn't it be better to put the instruction ID as first parameter (but I guess putting it last is better as you can do --arguments.length
) ?
$B.$call([2,2,14], $B.$getattr_pep657([0,0,3], locals___main__.a, 'replace') )('e')
// or
$B.$call(0x22E, $B.$getattr_pep657(0x3, locals___main__.a, 'replace') )('e')
I'm picking at straws here, but it is easier to visually associate the position to the called function (and if many lines, align the code better). In current implementation the code for $B.$call
is quite far away, at the total opposite of the line :
$B.$call($B.$getattr_pep657(locals_main.a, 'replace', [0, 0, 9]), [2, 2, 14])('e')
But yeah, I'm really picking at straws here xD.
I guess the only optimization we could do on the transpiler production are :
[a,b,c,d]
representation{$kw=kw, val:[{x: 3}, argvv]}
) - why not doing : {$kw={x: 3, ...argvv}
?), but I won't look too deep into that.frame_enter()
/frame_leave()
/arguments/locals
preparation inside $B.$call
? Trying to reduce the generated JS code size ?
$B.$call_loop
when calling a function in a loop ? (I really don't know).(*) x.foo()
is converted into :
$B.$call($B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])()
Which seems strange to me as I'd expect to see either :
$B.$call(locals_exec.x, $B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])()
// or
$B.$call($B.$getattr_pep657(locals_exec.x, 'foo', [0, 0, 5]), [2, 2, 7])(locals_exec.x)
Making me think that $B.$getattr_pep657
is building a new anonymous function when trying to access to the property ?
return function(obj, attr) {
let fct = obj[attr]
// some checks
return function(args) { fct.call(obj, args) }
}
Then there could be ways to optimize it a little I guess.
I'll try to hide messages that are not relevant anymore to keep this thread clean.
Your test https://jsperf.app/mitogo modifies a global variable, which is never the case in Brython code (or unwillingly...).
If you set a local variable, there is no difference, though it's likely that the data is initialized : https://jsperf.app/xofeta.
Your test https://jsperf.app/mitogo modifies a global variable, which is never the case in Brython code (or unwillingly...).
This isn't a global variable (it isn't in window) : it is a local variable.
If you set a local variable, there is no difference, though it's likely that the data is initialized : https://jsperf.app/xofeta.
The browser sees that the variable is assigned but never used.
The browser can be very tricky on small part of codes. I'll check again in few days.
I made a test to prevent any agressive Browser optimisation, the conclusions are :
The explanation is quite simple: the array needs to be rebuilt at each iterations which is costly and require dynamic memory allocation. Whereas the number is constant and can be stored on the stack. Using arrays takes also increase memory usage if lots of functions are called without interruptions, as the garbage collector might not have the time to pass and liberate the memory.
- Test 1 : https://jsperf.app/xofeta/2
Firefox:
Chromium: (WTF????)
- Test 2 : doing nothing but a check ( https://jsperf.app/xofeta/3 )
Salut Denis,
I found an easy way to get a VERY HUGE performance gain in the current system : for the frame/stack : use an uni-directionnal tree instead of a stack.
Thanks for the suggestion. I have implemented it in commit 44ba0d16823857b10ad17e6d4d89bc3712dfc07c (many things to change in many places !). The performance gain is not huge; test results vary from one run to another, but there is nothing really noticeable. If you have tools to compare this version to the previous one it will help.
Could you also test various ways of encoding position info (compared to the current implementation with an array) and see if it actually improves the overall performance ? Not only unit tests as you did, but tests with complete Brython code, as in the integrated speed measurements made by _speed/makereport.html.
Hi,
I'll have to take some time to write my answer as there are lot to say and some tricky things to explain.
Else, during this week, I worked on the proposal for asynchronous import, and on its redaction. It is almost done, and I think I'll post it during this WE.
Optimisation in Brython is quite tricky.
Not only we have the constrainsts to mimic CPython, thus requiring some "costly" way of doing things... but we are also in JS, on a Browser, which is quite tricky... and I discover new nuances each days.
Is an unidirectionnal tree faster than a dynamic-allocated array for insertions and "branches" ? YES, no doubt about that.
Is JS using a "dynamic-allocated array" ? Errr... not sure... for dense arrays, yes... for parse arrays it uses a different structure that should be nearly equivalent (?) to trees for insertions. Unfortunately, we have no guarantee about which internal structure the browser will choose. Normally it should be a dense array in our case... We can force a parse array by starting filling the stack from the end (index 999) and decreasing the index at each new element (next element would then be at index 998).
Still, is a unidirectionnal tree faster than JS arrays for "banches" ? YES, no doubt about that. It'd also save on memory usage.
In real-life do we really get big stack ? It depends. For recursive functions for exemple you are likely to get big stack. In other cases, it depends on your code.
When exactly does Brython perform theses "branches" (the .slice()
) ? I don't know, so I can't really test it.
Is this kind of things kills performances in the general case ? YES, no doubt about that. Even more when this is a peace of code called this often.
Is Brython the general case ? Errr... no... it performs lot of other costly operations (e.g. processing of parameters in an empty functions is 38% of total execution time).
Can we still expect increases in performances with that ? YES, up to 10% of total execution time (due to enter_frame()
).
Then can we observe a gain in performances with the new version ? NO. The stack size is limited to 1000, which is roughtly 5ms of execution time for a recursive function call. This is not enough to be able to notice a <10% difference. We'd need a way bigger stack size to be able to test it (and I couldn't test the "branches" as I do not know exactly when they occurs).
Would a pool help improving performances ? Maybe. It'll depends whether the operation of fetching/giving back the object from/to the pool is less costly or not compared to simply building a new one.
Why do you announce e.g. "x40 faster" when in real life I see no significative gains ? We have to distinguish the increase of speed on a very limited piece of code, and the global execution time. If your piece of code is responsible for 10% of total execution time, making it 40x faster would not give you more than a -10% global execution time. And I have to confess, I didn't expected Brython to do so much costly things, often due to the fact we need to mimic CPython behavior.
Can we still significanlty increase Brython performances ? YES. For example, optimising parsing of arguments (38% of global execution time) is likely to produce noticeable effects. The issue is mainly mathematical :
imagine a code taking 1000ms to execute, and an optimisation that helps you to win 25ms, it's not even worthing it. But if you have another optimisation that make your code 20x faster, your code would now runs in 50ms. Then a 25ms reduction of execution time is now a x2 speed increase so the previous optimisation starts to become interesting.
imagine 9 optimisations, each making you win 10% of execution time (additive). They are the same, but the first would make you win only 10% of execution time, while the last, after the 8 others has been made, would give you a x2 speed increase, and the 9 together are a x10 speed increase.
Finally do we really care about theses optimisations, is it really worthing the trouble ? Errr... difficult to say. Brython still seems to execute relatively quickly and DOM operations would likely still be quite expensive anyways. Some code cleaning (that can also lead to some speed increase), documentation, ... might be currently more important than optimisation ?
Maybe you can also point me to some Brython scripts that you think are quite slow to execute (e.g. compared to CPython), and I'll investigate where the cost is coming from ?
as in the integrated speed measurements made by _speed/makereport.html.
It seems the directory www/tests/ace/
is missing.
Correct, I have removed the Ace scripts from make_report.html
.
I'll have to get a deeper look on the documentation.
I'm downloading the Brython archive from the git, I set a webserver on www
, but I get some JS errors on www/speed
and www/speed/make_report.html
.
Things are improving, thanks to your suggestions in the past days (encore merci !) and an optimization in the creation of class instances (commit cbaccc078cf1a0e8714c5986aabab8839b1b1601). For the first time I think, none of the tests in make_report.html
is more than 10 times slower than CPython !
I have to confess, I didn't expected Brython to do so much costly things, often due to the fact we need to mimic CPython behavior.
Welcome to my world ;-)
Maybe you can also point me to some Brython scripts that you think are quite slow to execute (e.g. compared to CPython), and I'll investigate where the cost is coming from
As you noticed, parsing function arguments is complex because of the flexibility offered by Python. I see you had ideas to improve the algorithm, if you can focus on this topic and provide speed improvements, even limited, it would be great.
The regular expression module (in libs/python_re.js
) is also much slower than the re
module in CPython, but the code is probably very hard to follow...
I'm downloading the Brython archive from the git, I set a webserver on www, but I get some JS errors on www/speed and www/speed/make_report.html.
You must run the script server.py
at the root of the Brython repository, it is designed to handle the Ajax requests sent by make_report.html
You must run the script
server.py
at the root of the Brython repository, it is designed to handle the Ajax requests sent bymake_report.html
Hum, the following error is printed when executing server.py
:
Traceback (most recent call last):
File "/tmp/brython-master/server.py", line 243, in <module>
exec(make_doc)
File "<string>", line 38, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/brython-master/www/static_doc/3.12'
Things are improving, thanks to your suggestions in the past days (encore merci !) and an optimization in the creation of class instances (commit https://github.com/brython-dev/brython/commit/cbaccc078cf1a0e8714c5986aabab8839b1b1601). For the first time I think, none of the tests in make_report.html is more than 10 times slower than CPython !
I'm happy to hear that ^^ The last optimisations I suggested seemed unfortunately quite disappointing.
As you noticed, parsing function arguments is complex because of the flexibility offered by Python. I see you had ideas to improve the algorithm, if you can focus on this topic and provide speed improvements, even limited, it would be great.
It seems that even if Python have some flexibility, there are still some rules we can exploit to speed up the process.
I think the process would be to rewrite it from scratch, and start with the more easy case (no arguments), and to benchmark it. Add a new case (e.g. "normal arguments"), and re-benchmark it on the 2 cases to see if a significative cost is added on the first case, and so one. That'd enable to see when the addition of a new parameter would "costs" a lot, and if we need to split stuff.
In order to test that, I'll have to find a way to extract the function in order to execute it "alone" while not having the whole function call. That'd improve reproducibility and would make JS test easier. Maybe I should create a small git in order to do the benchmarks and adaptations little by little.
The regular expression module (in libs/python_re.js) is also much slower than the re module in CPython, but the code is probably very hard to follow...
Aren't CPython regular expression the "same" as JS regular expressions ?
The bug when starting server.py should be fixed by the commit above.
Aren't CPython regular expression the "same" as JS regular expressions ?
How I wish it would be the case ! But no, the Python engine has a slightly different syntax, and features that the JS engine doesn't have... See this page for the ever-evolving list of differences. [EDIT] this page is 7 years old, the differences have changed a lot since then
As always, the JS engine is available in Brython as window.RegExp
, there is an example in the demo page.
Humm.....
I'd like to test argument parsing like that :
from browser import window
def run():
print('ok')
window.run = run
setTimeout( () => {
function test() {
let foo = __BRYTHON__.jsobj2pyobj(window.run);
console.log(foo);
console.log( __BRYTHON__.args0( foo, arguments ) );
}
test('ok')
}, 1000)
The issue is that jsobj2pyobj()
doesn't give me back the original run()
function and the constructed object seems to lack some data in $infos
.
The bug when starting server.py should be fixed by the commit above.
Thanks
How I wish it would be the case ! But no, the Python engine has a slightly different syntax, and features that the JS engine doesn't have... See this page for the ever-evolving list of differences. [EDIT] this page is 7 years old, the differences have changed a lot since then
Yeah so it's 99% the same, but as always it's the 1% we almost never use that fucks up everything xD.
So the solution is either to have some "conversion" from Re regex to JS regex, which may be a little hard to manage (and introduce an additional cost), or to implement the whole Re regex in JS...
Yeah that's fucked up whatever you choose. xD
Yeah, 4k lines for re...
I could take a look to what takes the most time... but I'm afraid that wouldn't do miracles...
I assume python "re" is implemented in C++ (or something like that) in some python implementation. If you can somehow compile it to WASM, maybe you could get the full implementation while having some perf increases ???
I assume python "re" is implemented in C++ (or something like that) in some python implementation. If you can somehow compile it to WASM, maybe you could get the full implementation while having some perf increases ???
It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript.
It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript.
For the current implementation, did you use this file that you rewrote by hands, or did you implement your own regex engine from scratch ?
As it is in C, there is almost no way a JS version would be faster than CPython.
To convert it to wasm, I think there are tools like emscriptem. But tbf I am not sure how the results would be.
Yes, I wrote everything by hand. Adapting the C code (as I did for other scripts, libs/maths.js
for instance) was too difficult.
I see, you are quite courageous xD.
Then it may be that you have some algorithm issue, or something of the kind. Testing emscriptem to build the WASM may be interesting to do. I'll see if I can build a student project on that xD.
Else, could it be possible to get jsobj2pyobj(window.run)
returning the original brython function ?
That'd help me quite a bit to test arguments parsing.
Else, could it be possible to get jsobj2pyobj(window.run) returning the original brython function ? That'd help me quite a bit to test arguments parsing.
If it's only for your tests, the best is to hack the function to make it return what you need
If it's only for your tests, the best is to hack the function to make it return what you need
I think it'd be best practices that :
let Y = jsobj2pyobj( pyobj2jsObj( X ) )
X === Y # they are the same object.
I think such symmetry might prevents issues.
Else, I found a little workaround :
from browser import window
def run(i):
print('ok')
def getRun():
window.run = run
window.getRun = getRun
let orig = __BRYTHON__.pyobj2jsobj;
__BRYTHON__.pyobj2jsobj = (args) => {
let ret = orig(args);
ret.$orig = args;
__BRYTHON__.pyobj2jsobj = orig;
return ret;
}
window.getRun();
let foo = window.run.$orig;
console.log('run is', foo);
console.log( __BRYTHON__.args0( foo, ["ok"] ) );
}, 1000)
So now I can test it a little better (maybe more tomorrow). Maybe more during the WE. I think I'll open a git specifically for arguments parsing it order to be able to revert and do other things.
I still have some JS errors related to ""ace" on the speed pages.
I think I'll start to do some micro optimizations on the files (array preallocation, using const
/let
instead of some var
, change some algorithms, etc). I won't test for the speed as it'd be likely insignificant most of the time (with sometimes a nice surprise in perfs :) ). I will favor code readability, i.e. I won't touch suboptimal loops like for X in Y
and for X of Y
(using for(let i = 0; i < Y.length; ++i)
is in fact faster), unless you want me to replace them also (but I am not sure the gain in perf is worth it). I think I'll replace the .forEach()
though.
The goal would be to clean the code a little, reduce a little memory operations (e.g. when preallocating arrays) and helping browser optimizations (e.g. using const
when it applies).
There are ~50k lines to look at, so I'll do it little by little.
It is written in C, the files are here. I have no idea how to convert it to WASM or Javascript.
Found an interesting link for it : https://openhome.cc/eGossip/WebAssembly/C2Wasm.html
Gotta check on it one day.
Hi,
I suggest several small optimizations for the debug information that Brython inserts into the JS generated code :
Interesting optimisations
Please also look at
A. Use constants as index to an array, instead of arrays[NO]Please also look at :
Currently, an array describing the position of the "executed" python code is given as the last argument of underlying JS functions, e.g. :
This has the disadvantage of potentially building an array each time the line is executed, and takes more space in the generated JS code.
Therefore, I suggest using globally defined constants, defined during transpilation time. This would increase execution performances while reducing the size of the generated code :
The code would then be written :
This would add an insignificant cost when printing error messages, i.e. to do operation
DEBUG_SYMBOLS[the_constant]
. But it doesn't matter as :B. Build the constant as a mathematical expression from the 3 numbers [a,b,c].
In the previous suggestion, we used an array to convert a constant to an array containing the information required for DEBUG. But a JS constant has 53bits exploitable.
This can be split into 3x16 bits, i.e. 3 numbers in the range [0 - 65536]. I'd argue that no python lines should ever be that long (PEP indicates lines should be <79 characters). At transpillation time, we would simply do : code :
Giving the following code :
Then, when handling the error :
This would be quite quick when handling errors, while taking less RAM memory (no needs for a
DEBUG_SYMBOLS
structure). Current version takes 7 characters minimum for the[a,b,c]
, array, this version may takes at most 8 characters, so one more (but generally would be at most 7 characters).Of course, we can reduce the value of
NB_BITS
if needed, and even add the line number if we have enough bits. This makes the code more readable as we understand that0x09
is a special value.More opti :
With more knowledge about the 3 [a,b,c] values, we can reduce the number of bits. E.g. if a <= b <= c, instead of storing [a,b,c] into 53 bits, we can store [a, b-a, c-a] into 53bits. Then, we can attribute less bits to e.g. b-a and c-a.
Which could liberate bits to store the line numbers if interesting.
C. Replace all the(NO)$B.set_lineno(frame, 7);
by a map ?Instead of keeping track of the line number, which likely kills the execution performances, creating a structure enabling to convert the JS stackstrace into the Python stacktrace (this would also enable better error printing I think).
The structure would be globally defined as :
I am not sure about the value we need to put in FILE_A/FILE_B, we'll have to look at JS stacktrace more in depth when raising exceptions inside Brython code. Maybe there are also ways to cheat a little.
Getting the line numbers would then be done as :
When transpiling the Python code, the filemap would be built as :
This may make transpilation slightly slower, but I'll argue that :
This will also make error printing more slow (but we don't care).
Also, with that, would we still need the frame system ? Is there other lines numbers hidden in the generated JS code ? (e.g. for brython class definition ?)
EDIT: If the generated JS file is minimified, the file numbers will not match anymore. I think the minimifier can also build a sourcemap, that will give you the good lines numbers in the stacktrace. Else, a library might be able to do the trick. Else, I'd argue that if you minimify, you want performances.
D. Option execution-mode="production" to remove debug informations, and make output cleaner/shorter.(NO)Currently, because Brython doesn't implement sourcemaps, the generated JS code is bloated with arrays, and
$B.set_lineno(frame, 7);
calls.This makes the generated code harder to read for a human, increases the generated code size, and has a cost on performances. This is really useful when developing to better understand errors and fix them. However, in production, we might want to go faster and might not be interested by debugging information.
For this usage, sourcemaps are generally used (cf more info about sourcemaps structures), but I can understand it might not be easy to make.
Hence, I suggest to implement an option
execution-mode
that could take 3 values :$B.set_lineno
in the generated code (as well for other functions with the same goal).Cordially,