Closed bdice closed 6 years ago
Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).
If this is just a thin wrapper around an existing implementation, then I agree that it should be maintained outside freud. Freud is a loosely connected set of useful analysis code, yes - but sharing a common goal of fast C++ implementations of operations on particles in periodic boxes.
Original comment by Mayank Agrawal (Bitbucket: amayank, GitHub: amayank).
I initially thought that freud is collection of useful set of analysis code where components may not be necessarily independent. But I see the point. I agree that this looks more like a utility function and doesn't really use anything in freud. A design document will surely be helpful.
A utils directory in freud is also a good idea similar to what plato has, which can store all the orphan routines.
Original comment by Eric Harper (Bitbucket: harperic, GitHub: harperic).
@csadorf I don't see OPTICS anywhere in any branch (again, I could be wrong, but nothing came up in searching via git or in the source), and AFAIK mayank's DBSCAN branch is the only thing with DBSCAN (cluster just uses a cutoff, which isn't DBSCAN).
@amayank For now I'm just suggesting we create a repo (maybe even a meta-repo where people can store their nice utils and helper functions) for this kind of thing, since it's very useful but doesn't really fit in the design of Freud. I'll make sure to update the design document so that it reflects the nature of the code that should go into Freud.
Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).
I thought we had both DBSCAN and OPTICS in freud (maybe not master, but in a branch), that's where my confusion came from.
Having a minimal example of a workflow where such an integration would lead to substantial benefit would probably be important for this discussion.
Original comment by Eric Harper (Bitbucket: harperic, GitHub: harperic).
On the other hand, without clear-cut needs to add, and the fact that's one more thing to support, using sklearn might be the best way to go about this. I'm still reluctant to add this kind of feature into Freud directly, especially since:
Overall, I would say this should just be spun off as its own repo so that others can use it, but it doesn't really feel like part of freud (since it's a sklearn wrapper and doesn't really use anything in freud).
@csadorf @klarh @amayank @joaander
Original comment by Eric Harper (Bitbucket: harperic, GitHub: harperic).
Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).
I never really understood why we have our own DBSCAN implementation when sklearn has that and so many other clustering routines perfectly well implemented. It's much easier to use the sklearn implementation since it has a uniform clustering API and different methods are more easily exchangeable.
Original report by Eric Harper (Bitbucket: harperic, GitHub: harperic).
@amayank has a PR which is a wrapper to sklearn's DBSCAN. This is a very useful feature to have around, but as written I don't think it belongs in freud.
I would like to add DBSCAN to freud at the C++ level as it's a very useful method for clustering (and we already have a clustering section). It would be nice to be able to have the current clustering and DBSCAN available for a way to generate lists of particles to run analysis on, so I think it's a worthwhile goal. It'd also be nice to be able to have PBCs baked in instead of having to do buffer calls like in sklearn, which should be pretty easy given the wrap commands we already have