davbeek / gitinspectorgui

0 stars 0 forks source link

Option --multi-core gives a 4 to 6 times improvement for multiple repos #39

Open davbeek opened 3 weeks ago

davbeek commented 3 weeks ago

For 74 student repos we get the following approximate execution times from --no-multi-core to --multi-core:

--no-blame-history 16s -> 4s

--blame-history 130s -> 20s

Conclusion: the speed increase is so much, we could make --multi-core default. However, for development and debugging, single-core and single threading is much easier.

Alberth289346 commented 3 weeks ago

Nice, not sure how that happens.

You may want to do some profiling where the time is spent.

davbeek commented 3 weeks ago

I did, we have option --profile n where n is the number of profile output lines. Normally time is spent on IO polling, but for option --blame-history, time is spent mainly in module element from BeautifulSoup, which does html element processing.

Alberth289346 commented 3 weeks ago

Where is the html coming from? I would guess git doesn't produce it, right?

davbeek commented 3 weeks ago

html comes from BeautifulSoup (bs4). The final symbolic html representation is in BeautifulSoup object soup. You get html via str(soup).

Alberth289346 commented 2 weeks ago

ah, ok

davbeek commented 2 weeks ago

I should add that now that I have added the werkzeug server for dynamic html creation, option --blame-history no longer leads to big html files. Even for big repos, generated html files are around 0.1MB.

I must say that it took me a few days to get things right and properly shutdown localhost servers via multiprocessing queues on tab or window close events. Didn't work via Flask, so I had to go directly to the lower level werkzeug api instead of via Flask, but once you understand the werkzeug api, you can get really nice and easy to understand code.