Just Curious: Why so slow?

jacobbridges commented 9 years ago

As a precursor to my question: I'm not here to cast doubt upon the usefulness of this project or the adequacy of those who have worked on it. In truth I am very impressed by this project and when I first discovered it I began using it in most of my projects! (I do lots of data mining, so a library such as this is priceless.)

But I noticed recently that some of the functions in Pydash are very slow compared to their Python counterparts. For instance consider the following code:

from pydash import filter_

In [.]: x = [{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 4}]
In [.]: %timeit filter_(x, {'a': 1})
10000 loops, best of 3: 62.1 µs per loop

In [.]: %timeit filter(lambda y: y['a'] is 1, x)
1000000 loops, best of 3: 417 ns per loop

Now in truth I would never expect the Pydash equivalent functions to be faster than the Python counterpart, but ~150x slower?

Any idea how the speed of these functions could be improved? Since I made this performance discovery I've stopped using the pydash library so much and went back to filter + lambda...but goodness that can get ugly! I love the idea of this library, I just wished performance was better.

dgilland commented 9 years ago

No worries about the criticism of those performance numbers :smile:. Admittedly, I have not done much in terms of profile/optimizing for speed :disappointed:.

I went ahead with some basic cProfile analysis and determined that the majority of the time was spent in is_builtin and has which were called from here and here respectively.

I did some quick rework and now the timeits are:

$ python -m timeit "
from pydash import filter_
x = [{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 4}]
filter_(x, {'a': 1})"
10000 loops, best of 3: 29.6 usec per loop

$ python -m timeit "
x = [{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 4}]
filter(lambda y: y['a'] is 1, x)"
1000000 loops, best of 3: 0.882 usec per loop

which brings the difference down to ~34x slower than the native filter.

You can view the changes on the feature-performance branch. If you know of any other poor performance areas, I (or you if you are so inclined) can take a look and submit a patch.

UPDATE: Looks like the is_builtin change in call_callback caused some tests to fail on Python 3.4. I've since reverted that change which makes the timeit as follows:

$ python -m timeit "
from pydash import filter_
x = [{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 4}]
filter_(x, {'a': 1})"
10000 loops, best of 3: 49.4 usec per loop

which is ~56x slower than the native filter.

jacobbridges commented 9 years ago

Thank you for responding so quickly!

I fetched your upstream branch on my local version last night and was tinkering with the code. (I swear that iteratee function is magical..) There is a noticeable difference in performance, thank you for putting effort into that. If you don't mind I would like a few days to tinker with the code and see if the efficiency can be increased even more.

dgilland commented 9 years ago

With the latest commit, 431fdd5, the timeit is:

$ python -m timeit "
x = [{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 4}]
filter(lambda y: y['a'] is 1, x)"
1000000 loops, best of 3: 0.754 usec per loop

$ python -m timeit "
from pydash import filter_
x = [{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 4}]
filter_(x, {'a': 1})"
10000 loops, best of 3: 22.8 usec per loop

which is ~30x slower.

jacobbridges commented 9 years ago

Wow! That is impressive!

Earlier today profiling the filter_ function return:

1110 function calls (1083 primitive calls) in 0.003 seconds

But now the profile returns:

88 function calls (85 primitive calls) in 0.000 seconds

Very nice sir, thank you for your persistence. I will profile the other available functions and see if there are any efficiencies which could be added.

dgilland commented 9 years ago

Luckily, the optimizations I've done improve the callback system so all the special callbacks that internally use the iteratee function will see a speedup.

If you're able to identify other areas that have poor performance, that'll be a big help.

jacobbridges commented 9 years ago

I wrote a quick script to run timeit on a pydash function and an alias pure Python snippet and calculate the difference in runtime, here are the results (so far!):

pydash.collections.filter_(list[num])     ==> 29x slower than Python version
pydash.collections.filter_(list[obj])     ==> 69x slower than Python version
pydash.collections.find(list[num])        ==> 653x slower than Python version
pydash.collections.find(list[obj])        ==> 13x slower than Python version
pydash.objects.assign(simple)             ==> 135x slower than Python version
pydash.objects.assign(complex)            ==> 122x slower than Python version
pydash.objects.callables                  ==> 0x slower than Python version
pydash.objects.clone(shallow)             ==> 19x slower than Python version
pydash.objects.clone(deep)                ==> 1x slower than Python version
pydash.objects.defaults                   ==> 1x slower than Python version
pydash.objects.find_key                   ==> 182x slower than Python version
pydash.objects.has                        ==> 442x slower than Python version
pydash.objects.keys(obj)                  ==> 3x slower than Python version
pydash.objects.keys(list)                 ==> 5x slower than Python version
pydash.objects.values(obj)                ==> 1x slower than Python version
pydash.objects.values(list)               ==> 22x slower than Python version

I will keep updating this issue as I write more tests, but I hope this is a helpful start. Also, please let me know if any of the "tests" can be improved.

dgilland commented 9 years ago

I think your script is a good start!

You should consider adding it to the repo under perf/perftest.py and then adding a makefile target to execute it (e.g. make perftest). I was able to run the test using Python 3 but the test fails on Python 2 (for some reason Python 2 throws a key error during the pydash.collections.filter_(list[obj]) test while Python 3 does not).

UPDATE: The reason for the failure on Python 2 but not 3 is because filter() returns an iterator in 3 so the KeyError isn't thrown because the iteration doesn't execute.

Some of the pydash/Python statement comparisons may be a little unfair though. Some potential issues:

pydash.assign: pydash performs a deep clone on all sources before assigning them to the target and then loops through each source followed by each key/value pair. I think a fairer Python comparison would be to use obj.update() instead.
pydash.objects.values(obj): For the Python statement, you could do list(obj.values()).
pydash.filter: Assuming you used Python 3 to run the tests, filter() in Python 3 returns an iterator which means the Python statement here isn't actually filtering anything. Would need to call list(...) on it to actually iterate.

It may also be easier to read/more maintainable if the test statements could be made into actual Python functions instead of strings. Then for the timeit strings, it could be something like "pydash_assign_simple()" and "python_assign_simple()".

Below are the perf benchmarks run on various commits for comparison. NOTE: It uses a modification of your script that fixes the filter() issue.

Latest develop branch:

pydash.collections.filter_(list[num])     ==> 222x slower than Python version
pydash.collections.filter_(list[obj])     ==> 229x slower than Python version
pydash.collections.find(list[num])        ==> 1351x slower than Python version
pydash.collections.find(list[obj])        ==> 129x slower than Python version
pydash.objects.assign(simple)             ==> 355x slower than Python version
pydash.objects.assign(complex)            ==> 177x slower than Python version
pydash.objects.callables                  ==> 0x slower than Python version
pydash.objects.clone(shallow)             ==> 26x slower than Python version
pydash.objects.clone(deep)                ==> 1x slower than Python version
pydash.objects.defaults                   ==> 2x slower than Python version
pydash.objects.find_key                   ==> 262x slower than Python version
pydash.objects.has                        ==> 943x slower than Python version
pydash.objects.keys(obj)                  ==> 3x slower than Python version
pydash.objects.keys(list)                 ==> 6x slower than Python version
pydash.objects.values(obj)                ==> 2x slower than Python version
pydash.objects.values(list)               ==> 37x slower than Python version

Commit 431fdd5153b6ad7b43bfba4cbae426d17018b736 (optimizations to callback system):

pydash.collections.filter_(list[num])     ==> 12x slower than Python version
pydash.collections.filter_(list[obj])     ==> 16x slower than Python version
pydash.collections.find(list[num])        ==> 1135x slower than Python version
pydash.collections.find(list[obj])        ==> 15x slower than Python version
pydash.objects.assign(simple)             ==> 343x slower than Python version
pydash.objects.assign(complex)            ==> 196x slower than Python version
pydash.objects.callables                  ==> 0x slower than Python version
pydash.objects.clone(shallow)             ==> 25x slower than Python version
pydash.objects.clone(deep)                ==> 1x slower than Python version
pydash.objects.defaults                   ==> 2x slower than Python version
pydash.objects.find_key                   ==> 141x slower than Python version
pydash.objects.has                        ==> 1018x slower than Python version
pydash.objects.keys(obj)                  ==> 3x slower than Python version
pydash.objects.keys(list)                 ==> 5x slower than Python version
pydash.objects.values(obj)                ==> 2x slower than Python version
pydash.objects.values(list)               ==> 35x slower than Python version

Commit: b77344020e81c84620d984b590053fd5c8e5ebfc (optimizations to has/get):

pydash.collections.filter_(list[num])     ==> 13x slower than Python version
pydash.collections.filter_(list[obj])     ==> 17x slower than Python version
pydash.collections.find(list[num])        ==> 294x slower than Python version
pydash.collections.find(list[obj])        ==> 16x slower than Python version
pydash.objects.assign(simple)             ==> 360x slower than Python version
pydash.objects.assign(complex)            ==> 190x slower than Python version
pydash.objects.callables                  ==> 0x slower than Python version
pydash.objects.clone(shallow)             ==> 21x slower than Python version
pydash.objects.clone(deep)                ==> 1x slower than Python version
pydash.objects.defaults                   ==> 2x slower than Python version
pydash.objects.find_key                   ==> 80x slower than Python version
pydash.objects.has                        ==> 87x slower than Python version
pydash.objects.keys(obj)                  ==> 3x slower than Python version
pydash.objects.keys(list)                 ==> 5x slower than Python version
pydash.objects.values(obj)                ==> 2x slower than Python version
pydash.objects.values(list)               ==> 38x slower than Python version

jacobbridges commented 9 years ago

Yeah, I apologize for the crudeness of that script. I bashed it out in about 2 hours. And I just recently moved to Python 3 so I am still used to writing filter and map as lists, not generators. I'll try to rewrite the script with a more modular design (calling the python snippets as functions) and iterate over the design a few more times until you and I are both satisfied with it, then I will make a PR.

Thank you so much for being so responsive and helpful! Most maintainers take weeks to get back on an issue. I don't understand why this project doesn't have more stars.. It's like a gem to me.

dgilland commented 9 years ago

Thanks for the kind words about the project. I haven't done much to promote the library so not many people are aware of it.

Looking forward to your contribution! Will be nice to have some baselines for performance since that's one area where things have been lacking.

dgilland commented 9 years ago

Another thing to consider is having some perf tests that use larger datasets. With the optimizations in feature-performance, most of the performance hits happen before iteration. So I would imagine that for larger datasets, pydash would average closer (how close remains to be seen) to the non-pydash implementations (e.g. callback argument inspection tends to be one of the more costly steps but only happens once).

jacobbridges commented 9 years ago

If you like I can include a separate perftest file for large datasets. I would include them in the same file but the file will already be pretty long.

In my experience, I found pydash to be unbearably slow when working with large datasets. For example, pydash.filter_ on a list of 200,000 dictionaries was adding seconds of runtime to my script while a Python filter was almost instant. That was before the new performance branch though..

This perftest.py file will take me some time to complete. I'll push it up to my forked version, but I want to have a test for every function in pydash--with an exception to the chaining functions, as I don't know a fair way to compare those to pure Python.

jacobbridges commented 9 years ago

Pushed up a perftest.py which yields:

Test Name                      |  pydash  |  python  |
---------                      | time(ms) | time(ms) |
pydash.array.append            |  290.448 |    2.115 | 
pydash.array.cat               |    9.536 |    2.127 | 
pydash.array.chunk             |    6.880 |    4.202 | 
pydash.array.compact           |    2.841 |    1.734 | 
pydash.array.difference        |    4.851 |    3.317 | 
pydash.array.drop              |   57.311 |    0.684 | 
pydash.array.drop_right        |   57.044 |    0.681 | 
pydash.array.drop_right_while  |   62.689 |    2.054 | 
pydash.array.drop_while        |   52.776 |    1.453 | 
pydash.array.duplicates        |    7.972 |    4.017 | This is an unfair comparison, since pydash.duplicate does so much more.
pydash.array.fill              |    3.644 |    1.971 |

Those are the results on my machine using Python3.4. I'm done for tonight, will continue tomorrow evening.

dgilland commented 9 years ago

I came across this project today: benchy

May be worth checking out for this purpose.

jacobbridges commented 9 years ago

That looks like an impressive package, I really like the charts. Unfortunately it's not on pypi, hasn't been updated in 2 years, and I cannot get it to run on my Mac. Does it run on your system?

I'll keep adding simple perf tests to perftest.py for now.

dgilland commented 9 years ago

Actually, it is on pypi: https://pypi.python.org/pypi/benchy.

I was also able to install it with pip.

jacobbridges commented 9 years ago

The benchy on pypi isn't the same as the benchy on Github. I found that library too, installed it and tried to run "from benchy.api import Benchmark". Failed to import, no module named "api".

Got to looking at the source and it is a different project altogether.

dgilland commented 9 years ago

Ah, that's interesting. No worries then. What you're doing should work just fine.

dgilland commented 9 years ago

Any update on this?

dgilland commented 8 years ago

FYI, performance optimizations thus far have been release in v3.4.0:

https://pypi.python.org/pypi/pydash/3.4.0

jacobbridges commented 8 years ago

Hi Derrick, thanks for being so patient on this and for that new release! I've been swamped with overtime at work for the past month and haven't had time to do much side-programming. My schedule should free up greatly at the end of September.

jacobbridges commented 8 years ago

I've changed the way performance tests are created and ran. Should be much easier now.

whew I honestly didn't comprehend how large this library was until I started writing a performance test for each one haha.

dgilland commented 8 years ago

Closing for now. Feel free to submit a PR with improvements.

dgilland / pydash

Just Curious: Why so slow? #58