Closed ratoaq2 closed 7 years ago
Great !
Only one question before merging : Is the "Handle unused kwargs" commit really useful ?
pylint was failing. I had no time to check how to properly fix that.
Ok I see ! I'll merge, but removing this commit add adding a pylint ignore comment instead. Thank you, it's a great job !
On my computer, guessit unit tests runs 2 times faster with this version on python 3.5.1, from 25s to 12.5s. Using python 2.7.11, the benefits are less revelant but still there from 15s to 12.5s.
Released in rebulk 0.8.2
I've been profiling rebulk and guessit and I have some preliminary results that I could share.
My scenario is:
I measured the time spent (before / after) and I also used
cProfile
andline_profiler
to help me understand where most of the time was spent.In my machine, for the current guessit version and current rebulk version:
For the modified rebulk version:
The first hotspot was the usage of
call
to instantiateMatch
objects. Almost a millionMatch
objects are created and everycall
execution was introspecting the validkwargs
to be used. Usingkwargs
instead ofcall
reduced the total time from 37 seconds to 28.5 seconds (+23% faster).The second hotspot is related to
Matches
instantiation (which happens several times for children matches). TheBaseMatch
contains a collection of dictionaries in order to fast access matches by index, name, tag, etc. All these dictionaries are instantiated as soon as aMatches
object is created and all of them are populated as soon as matches are added to it. I tried someexperimental
code to only instantiate and populate these dictionaries when they are first needed, since not all rules will access all dictionaries from all matches. That change reduced the total time from 28.5 seconds to 22.5 seconds (+21% faster).I did run guessit test suite using these modifications as well another test suite that I have. All of them remain green
Hope this can be useful.