Closed FreddieWitherden closed 2 years ago
I have a suspicion that the problem with arrt(*range(50000))
is that under the hood this calls a varargs function with 50,000 arguments. That is not how most varargs functions are typically used, so no attempt is made to make this fast -- we just make sure it doesn't crash. Perhaps there's a variant that doesn't use func(*args)
?
Perhaps there's a variant that doesn't use
func(*args)
?
Not that I am aware of — although this doesn't mean that one should not be proposed. The current idiom for creating arrays in ctypes can be a bit vexing: cvalues = (c_int * len(values))(*values)
where one probably expects something more along the lines of: cvalues = c_int.array(values)
.
Can I ask you to take this to Discourse? This is really not the right tracker if you need help with ctypes.
My computer is quite a bit slower than @FreddieWitherden but I am indeed seeing that varargs overhead accounts for why the arrt(*range(50000))
code is so slow.
In [15]: from ctypes import *
In [16]: arrt = c_int*50000
In [17]: def varargs(*args): pass
In [18]: r = range(50_000)
In [19]: %timeit varargs(*r)
6.03 ms ± 45.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [20]: %timeit arrt(*r)
32.5 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [21]: %timeit x = arrt(); x[:] = r
24.6 ms ± 667 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
A way to create an array directly in ctypes without varargs seems like a reasonable feature request.
The ctypes
module has room for a lot of improvement in performance but AFAIK there are no maintainers active so there haven't been any perf improvements I know of.
This really needs to move to the cpython and tracker.
Consider the following snippet where we create an array of 50,000 integers with ctypes:
The initial list construction gives us a baseline of what could be considered reasonable. Then, we compare two means of creating a similar array using ctypes: one with an explicit constructor and another using slice assignment. Surprisingly the slice assignment version is faster.