brython-dev / brython

Brython (Browser Python) is an implementation of Python 3 running in the browser
BSD 3-Clause "New" or "Revised" License
6.36k stars 507 forks source link

Large loops freeze browser #1020

Closed desean1625 closed 5 years ago

desean1625 commented 5 years ago

Looping over large datasets freeze the browser. Python script below freezes browser until after execution is complete "<completed in 16094.00 ms>"

import random
letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
table = {"headers":[],"rows":[]}

table['headers'].append({"name":"row num","type":"string"})

for i in range(100000):
    newRow = []
    newRow.append("row" + str(i))
    table["rows"].append(newRow)

table['headers'].append({"name":"Random Letter","type":"string"})
for row in table["rows"]:
    row.append(random.choice(letters))

Would it be possible to process in all while and for loops in chunks to prevent browser hang?

Javascript

 var $res4404 = $f4404();
  function $f4404() {
    for (var $i2275 = 0; $i2275 < 100000; $i2275++) {
      $locals___main__["i"] = $i2275;
      $locals.$line_info = "9,__main__";
      $locals___main__["newRow"] = $B.$list([]);;
      $locals.$line_info = "10,__main__";
      $B.$call($B.$getattr($locals___main__["newRow"], "append"))($B.$getattr("row", "__add__")($B.$call($B.builtins.str)($locals___main__["i"])));;
      $locals.$line_info = "11,__main__";
      $B.$call($B.$getattr($B.$getitem($locals___main__["table"], "rows"), "append"))($locals___main__["newRow"]);
      $locals.$line_info = "8,__main__";
      None;
    }
  };

would become

 var $res4404 = await $f4404();

 function $f4404() {
   var CHUNK_SIZE = 100;
   return new Promise(async (resolve, reject) => {
     var breakTime = CHUNK_SIZE;
     var startTime = performance.now();
     for (var $i2275 = 0; $i2275 < 100000; $i2275++) {
       if ($i2275 % breakTime === 0) {
         if (performance.now() - startTime > 20) {
           await new Promise((resolve) => setTimeout(resolve, 0));
           startTime = performance.now();
         } else {
           breakTime++;
         }
       }
       $locals___main__["i"] = $i2275;
       $locals.$line_info = "9,__main__";
       $locals___main__["newRow"] = $B.$list([]);;
       $locals.$line_info = "10,__main__";
       $B.$call($B.$getattr($locals___main__["newRow"], "append"))($B.$getattr("row", "__add__")($B.$call($B.builtins.str)($locals___main__["i"])));;
       $locals.$line_info = "11,__main__";
       $B.$call($B.$getattr($B.$getitem($locals___main__["table"], "rows"), "append"))($locals___main__["newRow"]);
       $locals.$line_info = "8,__main__";
       None;
     }
     resolve()
   })
 };
PierreQuentel commented 5 years ago

@desean1625 This is an interesting idea ! I must confess that I am not familiar with Javacript's async and await... I tried to reproduce your example ; I had to adapt it slightly because the await must appear in an async function(). I also added a temporary variable to count the number of iterations, just to check that the whole loop is run.

I put 3 tests files online :

The time taken by the first two versions is unsurprisingly almost the same (around 4.5s) ; the async version takes longer, around 6s.

If the async version was implemented, it would have to be for each for X in range(N) loop. Even if it avoided freezing the window, I don't think the performance penalty is worth it. But once again maybe it's my translation which is not optimized.

Another thing that worries me is how this would fit in a random program. For instance if the loop is inside a function, would that function have to be declared async, and would the call of this function have to be awaited ?

desean1625 commented 5 years ago

I apologize for the solution not being 100% correct. Yes the function needs to be async to use the await

My use case is I have a self service data visualization platform and the user base (mostly data scientists) build visualizations for other users and save these dashboards. Those data scientist have asked for python to do data transformation.

Constraints that I have is bandwidth and computing power, so I cannot do the processing on a server.

So data isn't being passed back and forth, and for edge computing it is pulled once and processed in the browser. So I execute the python script they write using Brython. However, with large datasets it locks the browser up and makes for a terrible user experience. Also often the author has a much more powerful machine that the users so it is even worse for my broader community.

If the execution returned a promise and all the loops were async I could have a nice visualization while the script was executing.

I experimented with pushing the execution to a webworker, but the structured clone of large objects in the postMessage is slow...

https://stackoverflow.com/questions/34057127/how-to-transfer-large-objects-using-postmessage-of-webworker

I think that anytime there would be an iterator using a for loop it would call that function that returned a promise that would be awaited before continuing execution of the program.

desean1625 commented 5 years ago

The Async in your test does exactly what I expected and my FPS didn't drop under 30 frames which is good.

screenshot from 2019-01-16 16-12-51

Non async locked up 0 FPS for the whole processing time.

screenshot from 2019-01-16 16-13-10

The question is does it make sense to work on these performance issues to prevent the browser from locking up for the broader brython project.

PierreQuentel commented 5 years ago

Thanks for the explanation. I think I have found another solution to keep the application responsive, while maintaining the current implementation for "for" loops (no async). It consists in using the well-know pattern (at least in the Javascript world) setTimeout(function, 0) (translated to timer.set_timeout(function, 0)in Brython) which gives control to the browser event loop before the function is actually executed.

In the 4th version I have uploaded I use this technique to split the building of table["rows"] into calls to a function add_rows() that only adds 10000 rows at a time. When it's done, if there are still rows to add, the same function is called by timer.set_timeout(add_rows, 0) instead of just add_rows() : in the first case, control is given back to the browser engine, which can process other pending events ; in the second case the browser would freeze after a few seconds.

To expand the example, I have done the same with the loop to add a letter at the end of each row (function add_letter()).

During the processing, the browser displays information in a DIV zone (a very poor simulation of what could be done to build a visualization with temporary results) ; I have also added a textarea with a button to display the length of the text in it, also to show that the browser still responds - even if it's naturally slower than usual - during the whole data processing.

I realize that this technique requires adaptation from the standard way of programming in Python, but running in a browser has its constraints, and I as said before it's IMO not possible to make all for / while loops async : it would slow down all the programs where there is no risk of freezing, and calling a function or a method that has a loop would probably require an await, which would make the generation of Javascript code a nightmare, even worse than in the current state of Brython ;-)

desean1625 commented 5 years ago

That should work. Thanks for helping me work through that issue.