Open 1fish2 opened 6 years ago
This sounds good. Do you think this was part of the performance gains you achieved from refactoring the polymerize function? I know you used the memory views there but I don't remember if the gains were mostly from arranging for better memory access
For mc_complexation we just need a large enough sample input data set, then we can measure variations, first just changing to the new memory views, then checking if the Cython code makes avoidable calls into the Python interpreter, and trying direct loops in place of some NumPy calls esp. when a single loop can get the desired result in place of several NumPy calls.
With polymerize, I speculate (didn't measure) that the main effect was avoiding writing + reading intermediate arrays and doing fewer loops over the data. Memory views let the new loops access the data at C speed, but the docs don't say how different that is.
It's worth trying to speed up mc_complexation.pyx by using memory views, per Cython for NumPy users.
This is is a recent Cython feature that supports using C integers to index into the ndarrays' underlying buffers. Compared to using ndarrays directly, this avoids converting the C integers into Python integers and calling into Python code. Compared to using Cython types like
np.ndarray[np.int64_t, ndim=2]
, the docs just say it has "less overhead, and can be passed around without requiring the GIL" so it's feasible to usecython.parallel.prange
.mc_complexation's bulk NumPy operations like
np.max(np.sum(usesMolecule, 0))
will still need to call NumPy or be translated to additional loops and array indexing.