faster-cpython / ideas

1.69k stars 49 forks source link

Ctypes Array Creation Performance #456

Closed FreddieWitherden closed 2 years ago

FreddieWitherden commented 2 years ago

Consider the following snippet where we create an array of 50,000 integers with ctypes:

In [1]: from ctypes import *

In [2]: arrt = c_int*50000

In [3]: %timeit list(range(50000))
627 µs ± 8.59 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [4]: %timeit arrt(*range(50000))
4.09 ms ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit x = arrt(); x[:] = range(50000)
2.53 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The initial list construction gives us a baseline of what could be considered reasonable. Then, we compare two means of creating a similar array using ctypes: one with an explicit constructor and another using slice assignment. Surprisingly the slice assignment version is faster.

gvanrossum commented 2 years ago

I have a suspicion that the problem with arrt(*range(50000)) is that under the hood this calls a varargs function with 50,000 arguments. That is not how most varargs functions are typically used, so no attempt is made to make this fast -- we just make sure it doesn't crash. Perhaps there's a variant that doesn't use func(*args)?

FreddieWitherden commented 2 years ago

Perhaps there's a variant that doesn't use func(*args)?

Not that I am aware of — although this doesn't mean that one should not be proposed. The current idiom for creating arrays in ctypes can be a bit vexing: cvalues = (c_int * len(values))(*values) where one probably expects something more along the lines of: cvalues = c_int.array(values).

gvanrossum commented 2 years ago

Can I ask you to take this to Discourse? This is really not the right tracker if you need help with ctypes.

JelleZijlstra commented 2 years ago

My computer is quite a bit slower than @FreddieWitherden but I am indeed seeing that varargs overhead accounts for why the arrt(*range(50000)) code is so slow.

In [15]: from ctypes import *

In [16]: arrt = c_int*50000

In [17]: def varargs(*args): pass

In [18]: r = range(50_000)

In [19]: %timeit varargs(*r)
6.03 ms ± 45.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: %timeit arrt(*r)
32.5 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [21]: %timeit x = arrt(); x[:] = r
24.6 ms ± 667 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

A way to create an array directly in ctypes without varargs seems like a reasonable feature request.

kumaraditya303 commented 2 years ago

The ctypes module has room for a lot of improvement in performance but AFAIK there are no maintainers active so there haven't been any perf improvements I know of.

gvanrossum commented 2 years ago

This really needs to move to the cpython and tracker.