Open denis-migdal opened 1 year ago
Denis, I just wanted to say that I'm really enjoying reading your investigations into Brython performance. Coming up with these accurate benchmarks is difficult and also very helpful. I wish I had more time right now to help. Pierre, I agree with you that keeping Brython equivalent to CPython is more important than improving performance. But if we can have it all that's great! Thank you both.
In the commit above I have added a flag "trace" in ast_to_js.js
, if set to 0, the code for traces is not generated.
In ran the built-in speed test (_/speed/makereport.html) which generates _/speedresults.html on Firefox. The result for function calls is disappointing : there is almost no difference with or without trace.
def f(x):
return x
for i in range(1000000):
f(i)
def f(x, y=0, *args, **kw):
return x
for i in range(100000):
f(i, 5, 6, a=8)
This is very far from the x10 improvement you mention for Firefox.
Hummm...
This tends to indicate that function calls are doing something way more expensive compared to the little tests I made, making the gain in speed almost insignifiant (I don't think this is due to browser optimisation this time). If I had to make an educated guess, I'd say that maybe the stack handling is killing performances (creating, then adding, an object to an array (stack) at each function calls).
But I'm a fucking idiot... I could have simply used the editor to generate the JS and execute it with the browser tools to see the performances and what cost the most time... I should have done that in the first place to get "realistic" tests instead of relying on little mocks made in JSperf...
I'm sorry, I really didn't though about it.
For me it was clear that we'd have a speed gain. But didn't though it'd be this insignificant in reality, due to the things I didn't included in my tests.
Maybe it'd be possible to use brython.js in JSPerf, and copy the JS generated by the editor into the "Current version" test, then copy-paste and modifying it to create other tests ?
I really didn't though about it, and I don't know why.
When copying "brython.js" in "Setup JS", JSPerf sends errors (I assume the file is too big for that). When using the "HTML Setup", it works a little better, but some errors are thrown :
Uncaught DOMException: The operation is insecure.
<anonymous> https://raw.githack.com/brython-dev/brython/master/www/src/brython.js:22
<anonymous> https://raw.githack.com/brython-dev/brython/master/www/src/brython.js:129
brython.js:22
<anonyme> https://raw.githack.com/brython-dev/brython/master/www/src/brython.js:22
<anonyme> https://raw.githack.com/brython-dev/brython/master/www/src/brython.js:129
TypeError: $B.imported is undefined
uid1697192098634createFunction https://jsperf.app/sandbox/6529186ec7d9b980f2758267 line 1 > injectedScript:6
NextJS 6
ec
ec
ed
run
d
tJ
[id]-31be30a7d642ff84.js:1:22868
Funny enough the randomly generated name of my test is https://jsperf.app/pabeku (reading "Pas beaucoup" with an accent). Which shows the unsatisfactory results we got xD.
I will now try to see the performances through an execution on the Editor. Sorry, I should have done that from the start...
But now that we have this, could it be possible to build brython_stdlib.js
and compare its weight before and after ?
I'd assume it'd quite small, but is it like 5% or 0.0005% ?
It seems that this is $B.indexedDB=_window.indexedDB;
that causes this issue.
Could it be possible to have an option to disable it so that we could use tools like JS perf ?
Here the test I made.
It takes ~1min to execute, please find below the stack trace.
Do you know what is _b_.eval
? It feels really strange that such a function would be the things that takes the most time.
Compared to this, the function call is 2sec, so ~3.7% of the total execution time.
Here a clean trace I made locally on a clean HTML page (you can import it in the "Performance" tab of Chrome dev tools).
Trace-20231013T131633.json.zip
There is an anonymous function call that takes most of the time.
loop7
seems to be my loop
function, and f6
my f
function.
I think I first need to convert the Brython file into JS in order to be able to profile it better, but copying the JS code from the editor doesn't seems enough to be able to execute it (BRYTHON not found).
Here the Brython code
<!DOCTYPE html>
<html>
<head>
<!-- Required meta tags-->
<meta charset="utf-8">
<title>X</title>
<!-- Brython -->
<script src="https://raw.githack.com/brython-dev/brython/master/www/src/brython.js"></script>
<!--<script src="https://raw.githack.com/brython-dev/brython/master/www/src/brython_stdlib.js"></script>-->
<script type="text/python">
from browser import document
import time
start = time.time()
def f(i):
return None
def loop():
for i in range(100000000):
f(i)
loop()
end = time.time()
document <= "Done in " + str(end - start)
</script>
</head>
<body>
</body>
</html>
This function seems to take the most time (brython.js:5532 ):
return function(){try{return callable.apply(null,arguments)}catch(exc){$B.set_exception_offsets(exc,position)
throw exc}}}
And this is not code called from it, as its self-time is almost equal to its total time.
Are you making it catching exceptions continuously that you catch, rethrow, and ignore elsewhere ???
When executing with the loop alone it takes 3sec (but function loop7
only takes 1sec to execute).
With :
None
: no differences.The other hypothesis is that the browser does always the same thing to it cache the results, hence f6
only takes 15ms execution time, and most of the line 5532 execution time is due to the browser looking to its cache ?
Though, this function should produce side effects so shouldn't be optimized at this point ???
Then, maybe the optimisation I suggested made no difference in this example due to this optimisation of calling a function in a loop, but would manifest in real-life situation ???
It's so strange.
$B.$call=function(callable,position){callable=$B.$call1(callable)
if(position){position=$B.decode_position(position)
return function(){try{return callable.apply(null,arguments)}catch(exc){$B.set_exception_offsets(exc,position)
throw exc}}}
return callable}
decode_position
in the catch, there is no needs to pre-compute it. Performances when we get errors is not a big issue (you shouldn't have billions of errors per seconds). And this function should be called only once so precomputation is useless here.arguments
is slower than function(...args){ return callable.apply(null,args) }
?callable.apply(null, args)
is slower than calling the function directly callable(...args)
?f
?$B.$call
so that it'd take the function parameters in argument, you'll prevent 1 function creation (and one more if you handle $B.$call ( $B.getattr) )
as a $B.$callM
).But yeah function creation is expected to be slowly, and maybe prevents browser to perform some kind of opti ???
I'm trying with Firefox. The previous results seems to be due, again to Chrome's crazy optimisations...
Firefox 2023-10-13 14.44 profile.json.gz
If I try to interpret this graph :
$B.args0
takes 38% of the time, mainly due to $B.parse_args
(26% of the time).$B.enter_frame
and $B.leave_frame
takes 18% of the time.next
takes 12% of the time.$B.$call
takes 8% of the time.If I try to conclude from this graph :
$B.$call
could be reduced from 8%
to ~4%
(so -4% total execution time) if it didn't created a new anonymous function at each call. Even more if we could take a look in depth at $B.$call1
(line 5535).$B.set_lineno
takes 4% of execution time.for
loop is taking 17% of the time. I think we could win few % if we optimize the for i in range(a,b,step=c)
as a for(let i = a; i < b; i +=c)
or as for(let i = a; i < b; ++i)
, but that'd require deciding that all integer
are BigInt
. Indeed, next
is due to using an iterator.$B.$call1
(4% of execution time) :
$B.$call1=function(callable){if(callable.__class__===$B.method){return callable}else if(callable.$factory){return callable.$factory}else if(callable.$is_class){
return callable.$factory=$B.$instance_creator(callable)}else if(callable.$is_js_class){
return callable.$factory=function(){return new callable(...arguments)}}else if(callable.$in_js_module){
return function(){var res=callable(...arguments)
return res===undefined ? _b_.None :res}}else if(callable.$is_func ||typeof callable=="function"){if(callable.$infos && callable.$infos.__code__ &&
(callable.$infos.__code__.co_flags & 32)){$B.last($B.frames_stack).$has_generators=true}
return callable}
try{return $B.$getattr(callable,"__call__")}catch(err){throw _b_.TypeError.$factory("'"+$B.class_name(callable)+
"' object is not callable")}}
Tests if the object has a specific tag $xxxxx
to decide what to do.
What if, on theses objects, when adding theses tags, you add them a $get_callable
function (non-enumerable, non-configuration, non-writable?) ? They might help preventing all theses tests, as well as enabling to potentially give a prebuilt or a lazy built callable ? Then would you still need a $B.$call1
function separated from $B.$call
?
$B.enter_frame
(10% of the time)
$B.enter_frame=function(frame){
if($B.frames_stack.length > 1000){var exc=_b_.RecursionError.$factory("maximum recursion depth exceeded")
$B.set_exc(exc,frame)
throw exc}
frame.__class__=$B.frame
$B.frames_stack.push(frame)
if($B.tracefunc && $B.tracefunc !==_b_.None){if(frame[4]===$B.tracefunc ||
($B.tracefunc.$infos && frame[4]&&
frame[4]===$B.tracefunc.$infos.__func__)){
$B.tracefunc.$frame_id=frame[0]
return _b_.None}else{
for(var i=$B.frames_stack.length-1;i >=0;i--){if($B.frames_stack[i][0]==$B.tracefunc.$frame_id){return _b_.None}}
try{var res=$B.tracefunc(frame,'call',_b_.None)
for(var i=$B.frames_stack.length-1;i >=0;i--){if($B.frames_stack[i][4]==res){return _b_.None}}
return res}catch(err){$B.set_exc(err,frame)
$B.frames_stack.pop()
err.$in_trace_func=true
throw err}}}else{$B.tracefunc=_b_.None}
return _b_.None}
maximum recursion depth error
. The reallocation of the stack may be what costs most of its execution time (I can't believe that <7 conditions could explain this 10%).$B.leave_frame
(8.4% of the time)
$B.leave_frame=function(arg){
if($B.frames_stack.length==0){
return}
if(arg && arg.value !==undefined && $B.tracefunc){if($B.last($B.frames_stack).$f_trace===undefined){$B.last($B.frames_stack).$f_trace=$B.tracefunc}
if($B.last($B.frames_stack).$f_trace !==_b_.None){$B.trace_return(arg.value)}}
var frame=$B.frames_stack.pop()
if(frame.$has_generators){for(var key in frame[1]){if(frame[1][key]&& frame[1][key].__class__===$B.generator){var gen=frame[1][key]
if(gen.$frame===undefined){continue}
var ctx_managers=gen.$frame[1].$context_managers
if(ctx_managers){for(var cm of ctx_managers){$B.$call($B.$getattr(cm,'__exit__'))(
_b_.None,_b_.None,_b_.None)}}}}}
delete frame[1].$current_exception
return _b_.None}
This really shouldn't take so much time.
Could $B.tracefunc
be equal to None
? Then it'd be evaluated to true
because currently None
is an Object
?
I frankly don't know what is happening here.
$B.args0
(38% of the time - 26% parse args - 6% because using an iterator)
$B.args0=function(f,args){
var arg_names=f.$infos.arg_names,code=f.$infos.__code__,slots={}
for(var arg_name of arg_names){slots[arg_name]=empty}
return $B.parse_args(
args,f.$infos.__name__,code.co_argcount,slots,arg_names,f.$infos.__defaults__,f.$infos.__kwdefaults__,f.$infos.vararg,f.$infos.kwarg,code.co_posonlyargcount,code.co_kwonlyargcount)}
slots
for each functions could prevent from rebuilding one at each calls ???arg_names
seems to be an array ? Then use for(let i = 0; i < x.length; ++i)
to get a little speed increase (6%?).$B.parse_args
seems to require lot of access. Setting info = $info
may helps a very little ??? (not sure about this one)$B.parse_args
(26% of the time while having only one parameter).
I think there is stuff to do here also.
$B.parse_args=function(args,fname,argcount,slots,arg_names,defaults,kwdefaults,vararg,kwarg,nb_posonly,nb_kwonly){
var nb_passed=args.length,nb_passed_pos=nb_passed,nb_expected=arg_names.length,nb_pos_or_kw=nb_expected-nb_kwonly,posonly_set={},nb_def=defaults.length,varargs=[],extra_kw={},kw
for(var i=0;i < nb_passed;i++){var arg=args[i]
if(arg && arg.__class__===$B.generator){slots.$has_generators=true}
if(arg && arg.$kw){
nb_passed_pos--
kw=$B.parse_kwargs(arg.$kw,fname)}else{var arg_name=arg_names[i]
if(arg_name !==undefined){if(i >=nb_pos_or_kw){if(vararg){varargs.push(arg)}else{throw too_many_pos_args(
fname,kwarg,arg_names,nb_kwonly,defaults,args,slots)}}else{if(i < nb_posonly){posonly_set[arg_name]=true}
slots[arg_name]=arg}}else if(vararg){varargs.push(arg)}else{throw too_many_pos_args(
fname,kwarg,arg_names,nb_kwonly,defaults,args,slots)}}}
for(var j=nb_passed_pos;j < nb_pos_or_kw;j++){var arg_name=arg_names[j]
if(kw && kw.hasOwnProperty(arg_name)){
if(j < nb_posonly){
if(! kwarg){throw pos_only_passed_as_keyword(fname,arg_name)}}else{slots[arg_name]=kw[arg_name]
kw[arg_name]=empty}}
if(slots[arg_name]===empty){
def_value=defaults[j-(nb_pos_or_kw-nb_def)]
if(def_value !==undefined){slots[arg_name]=def_value
if(j < nb_posonly){
if(kw && kw.hasOwnProperty(arg_name)&& kwarg){extra_kw[arg_name]=kw[arg_name]
kw[arg_name]=empty}}}else{var missing_pos=arg_names.slice(j,nb_expected-nb_kwonly)
throw missing_required_pos(fname,missing_pos)}}}
var missing_kwonly=[]
for(var i=nb_pos_or_kw;i < nb_expected;i++){var arg_name=arg_names[i]
if(kw && kw.hasOwnProperty(arg_name)){slots[arg_name]=kw[arg_name]
kw[arg_name]=empty}else{var kw_def=_b_.dict.$get_string(kwdefaults,arg_name)
if(kw_def !==_b_.dict.$missing){slots[arg_name]=kw_def}else{missing_kwonly.push(arg_name)}}}
if(missing_kwonly.length > 0){throw missing_required_kwonly(fname,missing_kwonly)}
if(! kwarg){for(var k in kw){if(! slots.hasOwnProperty(k)){throw unexpected_keyword(fname,k)}}}
for(var k in kw){if(kw[k]===empty){continue}
if(! slots.hasOwnProperty(k)){if(kwarg){extra_kw[k]=kw[k]}}else if(slots[k]!==empty){if(posonly_set[k]&& kwarg){
extra_kw[k]=kw[k]}else{throw multiple_values(fname,k)}}else{slots[k]=kw[k]}}
if(kwarg){slots[kwarg]=$B.obj_dict(extra_kw)}
if(vararg){slots[vararg]=$B.fast_tuple(varargs)}
return slots}
For the function itself the costs is 16% :
You really might get some speed increases and memory usage reduction. If I take a look at the GC, it allocated 10MB of RAM. Nearly every 10ms, the GC is called, and lasts ~0.1ms (so 1% of execution time). This may be an indication we are making many many allocations ?
@PierreQuentel Could you guide me on the steps to go from the JS generated code from the Editor to a JS file I'll execute on my browser ?
In this way, I'd be able to modify it, and modify $B
to test my different hypothesis.
EDIT: I succeeded, I put the code inside a setTimeout()
(not quite ideal), and put $B.imported["exec"] = {},
.
I'll try some tests on $B
when I'll have the time.
A strange thing is that the convert JS code (from editor) seems to execute x2 faster than the py code (by last dev version).
Well, optimizing the for in range(a,b)
by a for i=a;i<b ;++i
(and removing one useless intermediate) is overall, 6.7% faster. The loop was 17% of total execution time, with the iterator being 6% of total execution time. This means a -61% execution time for the loop.
Pretty sure we can easily achieve a -50% execution time with all the things I found... and even more.
And indeed, removing the 2 debug trace conditions is a perf gain of only -0.84%.
However, the more optimization will be made, the more it'll grow. Also, it seems there are other operations linked to the debug traces in leave_frame
and enter_frame
, that might also continue to increase this number.
If I had to make a guess, I'd believe we could achieve > -4% speed gain (~x2 if we achieve the -50% execution time with other optimisations, more if we do more / and ~x2 again with other trace-related code from other functions). But yeah, better keeping it on the side, and coming back at it when more stronger optimization will be done.
Disclaimer: I'm talking about optimization because it is fun (and it helps me exploring some concepts), but maybe this shouldn't be the priority.
TL:DR; We can really increase parameters resolution speed by implementing 8 different functions/ways to resolve them depending on how the function is defined and how it is called.
Instead of calling $B.$call( fct)(args)
, we would do something like :
$B.$callType1( fct, args) {
return fct.callType1(fct, args);
}
$B.$callType2( fct, args, wargs) {
return fct.callType1(fct, args, wargs);
}
//etc.
$B.parse_args()
is a function that is generic.
In python, there are 4 ways to pass arguments to a function, and 3 ways to declare an argument + default values :
1,2,3
a=1,b=2,c=3
Which makes 2^7=128 possible combinations of function calls + the default values....
A. But maybe there are ways to write some special $B.$call(fct, args)
functions for some pretty common calls enabling to speed them up ?
$B.$callS(fct, args)
for (1,2,3,4)
i.e. without "=", "*" or "**" inside the call.The knowledge of the argument could enable us to use some heuristics on some type of calls and functions. Some of the order rules can also help e.g. positional can't be after named.
The list of calls I think can be interesting :
Maybe there are ways to merge some without any costs. But that may the argument gestion easier.
From the 7, only 3 cases that are really different, with one only requiring to check the names positions at the end, so only 2 really different cases for calls.
The list of type of function that I find interesting :
Which makes 8 combinations that could be handled to speed up arguments resolution. And be really significative for some simple but usual fonctions calls.
At translation time, the function declares some types of arguments resolution it will supports, foo.$resolve_args_CallType1 = resolve_args[DeclarationType1][CallType1]
, and when called, we let AST decide which functions to call.
B. Maybe we can, when calling, sort named parameters during AST transpilation ?
const f = ... // we have to keep order of operations
const Z = ..... // was originally the first given parameters
const X = .....// was originally the second given parameters
$B.call( f )({a:X, b:Z})
This would facilitate some operations/algorithms while costing nothing as it would be made during translation time.
C Javascript doesn't like when we do not precise the function parameters,maybe we should declare them, even if we are not using them. It helps the browser to know how many argument the function is likely to take.
Could be foo(...args)
instead of using arguments
?
Note: In modern code, rest parameters should be preferred. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/arguments
@PierreQuentel If you want to try out theses opti, maybe we should discuss first, because they are a little complex and would need to do things slowly and steps by steps.
Okay, **args
must keep arguments order... so B. isn't possible.
Then, writing foo.$callX(2,3,4, {a:1, b:3, ...args}) / foo.$callX(2,3,4, args) / foo.$callX(2,3,4, null)
(assuming last argument is always the named arguments), would helps merging, in parameters parsing, the combinations named
, **args
, and named+**args
. This would also prevents one useless array and an object creation.
Aaaaand its not possible as Python needs to raise an exception when e.g. args
contains a
as there would be 2 variables with a... No wonders why Python is so slow... Lot of optimisations can't be performed due to the way they authorize and forbid functions calls...
The solution is maybe something like :
arguments = {}
// easily parse positional arguments. EZ.
if( ! hasNamedArguments) {
if( offset < isRequiredIdx )
// throws exception
for( ; offset < nbArgs; ++offset)
arguments[ the_function_args_name[offset] ] = the_function_default_values[offset]
return //...
}
let keys = the_function_args_name.slice( offset ); // we need a copy + ignore the parameters already positioned.
// do it twice, for named arguments and args.
for( let name in varnames) {
let i = indexOf(keys, name); // using a Set is more efficient for big sets, but I don't think it'll be faster for us.
if( i === -1) {
// handle error here (we can do less optimized operations here, it's not an issue.
// if **args in function declaration, give them to a parameters[the_**args] = varnames[name].
// maybe needs another function because if **args arguments no found in keys, it'll be put into **args parameters while named arguments could have been the one removing it from keys.
// or adds a check : if( name in named_arguments ) => throws an error, else insert into **args.
}
arguments[name] = varnames[name]
keys[i] = null // so that it won't be find again and we could raise an error.
}
// checks if some required parameters are still presents.
for( ; offset < nbArgs ; ++offset) {
if(keys[offset] === null ) {
if( offset < isRequiredIdx)
; // raise an exception.
else
arguments[ [the_function_args_name][offset] ] = the_function_default_values[offset]
}
}
Now we would need to somehow merge named arguments
and **args
.
EDIT: why not doing
let entries = [ ...Object.entries(named_arguments ), ...Object.entries(**args_argument) ]
?
Could be made in the function call :
call( .... , null) // no named
call( .... , Object.entries(**args_argument) )
call( .... , [ ["a", value], ["b", value] ] ) ) // not quite efficient as will create an array for each named argument...
call( .... , [ ["a", value], ["b", value], ...Object.entries(**args_argument) ])
OR
call( .... , null, null) // no named
call( .... , Object.keys(**args_argument), Object.values(**args_argument) ) // needs to be careful.
call( .... , [ "a", "b" ], [value, value ] ) ) // maybe a little more efficient ?
call( .... , ["a", "b", ...Object.keys(**args_argument)], [value, value, ...Object.entries(**args_argument)] ) // needs to be careful.
OR
Doing let args = [...Object.entries(named_arguments), .Object.entries(**args)]
inside the hasNamedArguments
?
OR
Doing inside the hasNamedArguments
:
let args_keys = [...Object.keys(named_arguments), .Object.keys(**args)]`
let args_values = [...Object.values(named_arguments), .Object.values(**args)]`
OR Call :
call( .... , null, null) // no named
call( .... , {a:2,b:4}, null )
call( .... , null, args_argument )
call( .... , {a:2,b:4}, args_argument )
Then,
for( let name in varnames)
loop.for( let name in varnames)
loop, no needs to check whether name
is in named_arguments
for( let name in varnames)
loop, only needs to check name
is in named_arguments
(if name in named_arguments
?).Last solution might be the best ?
Question :
I see that you are inserting try{} catch(e){}
everywhere, is there a reason for that, instead of doing something like :
throw new PythonError("message", js_error, frame)
Then catching it either during a python try: except:
, or when giving a function to JS, or at "top-level" places (I guess with async
functions / at the file level/etc) ?
try {} catch(e) {
if( e instanceof PythonError){
// do all the leave_frame, the set_exc, and the trace_exception here by unstacking the frame stack ?
// this could be a function like $B.$process_py_exception(e)
let frame_cursor = e.frame;
while( frame_cursor !== frame ) {
$B.set_exc(e.err, frame_cursor);
if((! err.$in_trace_func) && frame_cursor.$f_trace !== _b_.None){
frame_cursor.$f_trace = $B.trace_exception()
}
$B.leave_frame();
frame_cursor = frame_cursor.previous
}
// do other stuff here.
}
}
For this to work, you'll likely have to catch JS exception when calling JS functions. But else, JS exception shouldn't occurs during Brython function calls... and if it does, this is a bug that should be fixed ?
I put a summary in the first message of this issue so that it'll be easier to look up for things.
[Sorry, posted it in the wrong issue]
Conclusions :
args0_new
is sooo much faster than previous parsing function.$B.$call
now costs HALF of the function call. This needs to be fixed.$B.augm_assign
is 23% of total exec time, I think this can be improved a lot.+=
, while it is 2.5% for +
, it's strange.@PierreQuentel Do you have some code (only using Brython core) you'd want me to benchmark ?
New benchmark with the new args0
parsing method :
Raw logs :
Firefox 2023-11-16 12.31 profile.json.gz
Fichier :
<!DOCTYPE html>
<html>
<head>
<!-- Required meta tags-->
<meta charset="utf-8">
<title>X</title>
<!-- Brython -->
<script src="https://raw.githack.com/brython-dev/brython/master/www/src/brython.js"></script>
<!--<script src="https://raw.githack.com/brython-dev/brython/master/www/src/brython_stdlib.js"></script>-->
<script type="text/python">
from browser import document
def f(i):
return i+i
def loop():
acc = 0
for i in range(100000000):
acc += f(i)
return acc
import time
start = time.time()
acc = loop()
end = time.time()
document <= "Done in " + str(end - start)
print(acc)
</script>
</head>
<body>
</body>
</html>
Another possibility, that might be interesting :
The advantage is that we'd have automatically an internal function we can call from JS. The internal function would take either an object (for Py implemented) or an array of arguments (for JS implemented) depending on the parser used.
This should also simplify the code of $B.$call
that is currently very slow.
We can modify the prototype of Object
and/or Function
to add a default $call
method if necessary.
For python functions, $call could be used to factorize some code, like the try/catch
and other boilerplate thingy. This would reduce code size, and therefore the parsing time (which is very slow).
Summary
Code cleaning:
try/catch
everywhere (more info here)build_fct_info()
to reduce generated code size ??? (more info).On AST code generation:
$B.call( $B.getattr(obj, f) )(args)
to$B.callM( obj, f )(args)
(more info)$B.call( f )(args)
to$B.callM( f , args)
(more info / 4% total exec time)function foo(...args)
instead ofarguments
??? (more info)Potential optimisations:
.$getCallable()
on callable object/functions (more info / 4% total exec time)for in range
(more info -6.7% of total exec time ) - requiresinteger
to be implemented asBigInt
.Done
enter_frame
/leave_frame
/frame = []
: instead of a stack use a undirectional tree with a pool (more info / 18.4% total exec time).decode_position
in the catch (more info)**Other:
====================================================================
Hi,
The support of
settrace()
adds 2if
in each functions, I tested several ways to implement it to see if we can improve its performances ( https://jsperf.app/fisuce ). As always, Chromium optimisation produces strange results on short examples.A. Use functions to precompute the condition :
One way that seems to allow optimisation is to replace the condition by a function call, i.e. instead of doing something like :
doing :
Ofc this is a mock example. But it as the advantage of reducing the generated code size, improve code readability, and can lead to execution speed increases (it seems at least as fast as the current method, and sometimes faster on Chromium).
The condition for that is to precompute the condition when calling
sys.settrace()
:And once
sys.settrace()
is called, we change the value of$B.enter_frame
. Then, it may be less efficient, but debugging isn't made to be efficient.It'll likely need some tweaks/test to see if small changes can produces better results.
B. An option "opti.settrace-support" (defaulting to
true
) :According to the python documentation :
We could safely assume that a Brython user may want to use
settrace()
when developping, but likely won't need it when deploying its Website.Hence, I suggest the addition of an "opti.settrace-support" defaulting to
true
(so with support ofsettrace()
. Then, when users wants to deploy its Website, and have better performances, he could disable this option.Once disabled, this options won't include the
settrace
lines in the produced JS code, so won't print :and
(and maybe other lines).
With this option disabled, functions calls could be x10 faster on FF, and x1.8 faster on Chromium.
Cordially,