glotzerlab / freud

Powerful, efficient particle trajectory analysis in scientific Python.
https://freud.readthedocs.io
BSD 3-Clause "New" or "Revised" License
273 stars 49 forks source link

NeighborList should be constructible from system-like object and 2D array of bond indices #748

Open bdice opened 3 years ago

bdice commented 3 years ago

Is your feature request related to a problem? Please describe. Occasionally I need to construct a NeighborList from bond topology data, which is currently somewhat hard to do. I've written the pattern below at least a dozen times, and it would be much better as a single function.

Describe the solution you'd like A new class method like NeighborList.from_system(system, indices, weights) would be helpful.

Here's a rough implementation that needs to be improved/tested. Note that this snippet will break with #738 (distances need to be replaced by vectors).

    @classmethod
    def from_system(cls, system, indices, weights=None):
        R"""Create a NeighborList from a system-like object and indices.

        Example::

            import freud
            import numpy as np
            box = freud.box.Box(2, 3, 4, 0, 0, 0)
            points = np.array([[0, 0, -1], [0.5, -1, 0]])
            indices = np.array([[0, 1], [1, 0]])
            nlist = freud.locality.NeighborList.from_system(
                system=(box, points),
                indices=indices,
            )

        Args:
            system:
                Any object that is a valid argument to
                :class:`freud.locality.NeighborQuery.from_system`.
            indices (:math:`\left(N_{bonds}, 2\right)` :class:`np.ndarray`):
                Array of integers corresponding to indices in the sets of
                query points and points.
            weights (:math:`\left(N_{bonds}, \right)` :class:`np.ndarray`, optional):
                Array of per-bond weights (if :code:`None` is given, use a
                value of 1 for each weight) (Default value = :code:`None`).
        """
        cdef NeighborQuery nq = NeighborQuery.from_system(system)
        query_point_indices = indices[:, 0]
        point_indices = indices[:, 1]
        distances = np.linalg.norm(
            system.box.wrap(system.points[point_indices] - system.points[query_point_indices])
            axis=-1,
        )
        return cls.from_arrays(
            num_query_points=len(system.points),
            num_points=len(system.points),
            query_point_indices=query_point_indices,
            point_indices=point_indices,
            distances=distances,
            weights=weights,
        )
vyasr commented 3 years ago

What types of system are you typically working with when this comes up? Is this in the context of OVITO? If you actually have a Box and numpy arrays of positions/indices, then (especially with Box.compute_distances, no need to do that yourself) this is as simple as:

distances = box.compute_distances(points[indices[:, 0]], points[indices[:, 1]])
NeighborList.from_arrays(...)

Admittedly the NeighborList.from_arrays API is a little bit clunky, and of course the from_system call becomes necessary if you're working with other inputs.

I'm not opposed to adding this function if it really is the best path forward, but I think the fact that something this straightforward is an annoyance suggests room for improvement in the existing APIs. For one, since as of #738 we're brainstorming API breaks for freud 3.0 we should consider moving num_query_points and num_points to the end of the arg list of from_arrays so that num_points can default to num_query_points like everywhere else in freud. Second, We we should consider extracting the first half of NeighborQuery.from_system into a separate function that just handles the conversion of various input types into a (box, points) tuple (with box always being a Box object rather than any valid input to Box.from_box). Then we'd have a simpler way to get this data without building a NeighborQuery, which is a roundabout way to import data from other sources (assuming that's why you're doing it here).

bdice commented 3 years ago

There are a few motivating cases for this. Sometimes I want to use another package to find neighbors, and I get back arrays of indices. Sometimes polymers/"bonds" in a GSD file also need this (e.g. to find the center of mass or radius of gyration). Being able to quickly combine (box, positions), indices into a NeighborList is purely a matter of brevity. This is simply a convenience API that would wrap the full from_arrays and replace the long snippet above with something more concise.

Regarding your idea of splitting the logic in NeighborQuery.from_system, I think a similar thing is already happening via the _RawPoints class, which is just a container for a box and points. I don't think we need to improve anything there? https://github.com/glotzerlab/freud/blob/ff9d561c2b8d719eea8e7167f083da3fbcc296a6/freud/locality.pyx#L365-L368

vyasr commented 3 years ago

The workflow you're suggesting makes sense, but I think it highlights the fact that there are any number of convenience APIs that we could implement that are fundamentally built around needing to coerce different input types into a (box, points) tuple. The conversion to a NeighborList is fine and could be added on top of that, but just the conversion from external objects to (box, points) would be useful to expose to users, especially if they need to do some more bespoke logic before inputting the data into freud (either for NeighborList construction or directly into a compute after some modification). If we wanted to expose _RawPoints publicly (probably renamed), then one option would be to implement __iter__ so that it could be transparently coerced via unpacking box, points = RawPoints(...).