Optimization for many calls

j-proc commented 5 years ago

Pythia mostly contains calls to freud, so it does not seem that there would be substantial benefits to cythonizing it. However, there are places where freud objects are regenerated rather than reused and it seems that many of the append and concatenate's could be replaced by direct numpy allocation and assignment.

This would require replacing many functions with classes that perform the same feat but store the freud objects and perhaps intelligently decide when to regenerate (based on box changes?).

Interested in other opinions on what else could speed it up or what API people would like to see used.

csadorf commented 5 years ago

I think cythonizing Pythia might be optimizing at the wrong level. We use freud as the "engine" for the exact reason to speed up the nearest-neighbor calculations etc. Pythia -- as I interpret it -- just combines and reformulates all of these data as specific descriptors.

I believe that cythonizing will make it harder to contribute and maintain this package so we should be careful where exactly we optimize. I'd rather adjust the freud API to make it easier to use freud efficiently if needed.

j-proc commented 5 years ago

@csadorf That's actually what I was suggesting. Leave all the code optimization in cython to freud. But more efficiently use the calls and reuse objects. I haven't looked at it enough, but a brief look showed a lot of object creation inside loops. I'm not sure exactly how inefficient this is and if there's any form of magical caching that happens automatically with cython which would make this not a negative.

I'll perhaps do some profiling on some previous code I ran that I expected to run much faster and see how valid these suggestions might be. But I think there should be a lot that's possible here.

klarh commented 5 years ago

I agree that adding cython would significantly increase the installation complexity and development overhead of the package. I would want to see compelling benchmarks for common use cases that currently run too slowly before being convinced that it's worth the trouble.

In some cases (I recall that the voronoi descriptors are particularly slow), perhaps it makes sense to move the core of some algorithms into freud proper?

bdice commented 5 years ago

@j-proc Is there a set of descriptors you are particularly interested in using?

@klarh It doesn't surprise me that Voronoi-based descriptors are among the slowest - freud does not perform its own Voronoi calculations, but instead clones particles across the periodic boundaries to ensure proper periodicity (increasing the problem size substantially) and calls out to scipy in a way that is perhaps less than efficient and likely not parallelized. I have been working with a different set of Voronoi-based descriptors that is not yet implemented in pythia, and have also observed that this is expensive.

csadorf commented 5 years ago

I'm not sure whether we are doing this yet, but a good start might be to create a test suite that does some benchmarking (and optionally profiling) so that we can reliably identify hot spots and potential regressions.

j-proc commented 5 years ago

I could make that my first half of the hackathon. If I don't get to it before then. Just setting up some general benchmarking and profiling. I could solicit some code from the group on general freud scripts that people expect to run faster and perhaps don't. I expect a lot of this will come down to inefficient coding, but it would be good to know I think where the speed of everything slows down.

(Off topic for the package but related to the discussion) One topic I haven't approached because it sort of violates the independence of freud from hoomd, but I think for the cases where there's freud calls while a simulation is running, a large speed up could be had by using the hoomd neighborlist instead of constructing a new one.

j-proc commented 5 years ago

@bdice I was particularly thinking that the descriptors where a nearest neighbor range are used. These seemed particularly slow and I suspect are generating new neighborlists for each count.

bdice commented 5 years ago

@j-proc I searched the codebase for append and concatenate, which you mentioned in the first post. I see three places where append is used that could be preallocated instead. I didn't see any instances of concatenate in the package, only in example notebooks. This might help a bit.

bdice commented 5 years ago

@bdice I was particularly thinking that the descriptors where a nearest neighbor range are used. These seemed particularly slow and I suspect are generating new neighborlists for each count.

@j-proc The NearestNeighbors class in freud is using LinkCell internally and can be very slow in heterogeneous systems or when the LinkCell cell size (rmax_guess in pythia) is not chosen properly. @vyasr and I have a branch of freud (spatial_data_v2) where we're planning to redo much of how freud finds neighbors, using AABB trees by default (borrowed from HOOMD's fast periodic implementation) and this should make performance significantly faster in many cases, especially if the AABB tree is re-used (which is now much easier to do). We hope this will also substantially increase the speed of platoviz, where the same spatial data structures can then be re-used for multiple neighbor queries of different types.

In general, having a good benchmark suite for both freud and pythia would be extremely helpful.

j-proc commented 5 years ago

@bdice Apologies, I think I'm just inserting what I expect the common usage would be there. You're correct.

I'm honestly not sure what all can be done. So Simon makes a very good point about doing the profiling first to see if there's actually much to do here. In the use cases I'd done in the past, it seemed like there were. But considering it's open source, one could always rewrite the functions to be faster for specific use cases.

j-proc commented 5 years ago

@bdice @csadorf Agreed that benchmarking and possibly a bit of profiling would be the useful start to this. I'll make a similar issue in freud so they can be done separately if need be although as we said, the bulk should be in freud and that should likely be done first.

bdice commented 5 years ago

Another specific optimization that could be tested for pythia: Replace the construction of explicit NearestNeighbors and LinkCell classes in the various _nlist_helper functions with freud.locality.make_default_nlist_nn and freud.locality.make_default_nlist. Currently the AABB tree neighbor finding algorithm is undocumented, since we are revising its API substantially, but it used internally by those functions. Those functions should always use the fastest general algorithm we have for constructing neighborlists - there are cases where LinkCell beats the (undocumented) _AABBQuery but that's infrequent.

https://github.com/glotzerlab/pythia/blob/f401f451e302af2442fb66a620f13e621ae2bb5f/pythia/bonds.py#L11-L17

https://github.com/glotzerlab/freud/blob/9f89a2211e18d192fdedd4121e9e65c62b92ce64/freud/locality.pyx#L368

glotzerlab / pythia

Optimization for many calls #2