DESI-UR / VAST

Void Analysis Software Toolkit
https://vast.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
9 stars 8 forks source link

Consider refactoring find_voids() to not include auxiliary functionality #47

Open QuiteAFoxtrot opened 3 years ago

QuiteAFoxtrot commented 3 years ago

find_voids() is the meat and potatoes of VoidFinder, but from a code/API standpoint it is little more than a wrapper around 3-4 other pieces of pipeline code which really can function independently:

find_voids:

Right now, certain important parameters like the minimum radius required to be a maximal sphere, which is used in combine_holes, are not available in the main call to find_voids() because they're default parameters in their respective sub-functions (like combine_holes). Theoretically we could fix this issue simply by making find_voids() become a sort of giant master interface to its sub-pieces, but this might not be the best design for sustainable maintenance on this project.

Additionally, there is really no reason for the VAST/VoidFinder package itself to include the disk I/O code within find_voids() (saving results to disk), especially as its currently just using an Astropy function. This choice has really just been for convenience and not necessarily an intentional design paradigm, and there are lots of convenient alternatives to the astropy I/O like numpy.savetxt() or using an HDF5 file, or even pickle if a potential user might desire.

This issue is probably my own fault for not recognizing these pieces in an earlier iteration, but as we move to get published in JOSS or have folks external to UofR/Drexel using this software its probably best that we try and move towards common practice and away from being coupled with old decisions from when this package was coupled with its working scripts.

The proposed fix to this would be to move most of the code in find_voids() into the working scripts (like SDSS_DR7_VoidFinder.py) so that the find_voids() API doesn't continue to grow and the sub-pieces like combine_holes() which contain important parameters are exposed as top-level functions to potential users.

Theoretically if we continue down this line of thought and revamp the whole package we may be able to drop almost every dependency except Numpy, which would be quite nice though that is probably a separate issue beyond the scope of this one. Looks like the setup.py requires only Cython and Numpy anyway, which is really nice

kadglass commented 3 years ago

Commenting on the file I/O in find_voids: yes, I think that it would be good to pull this out (so that it would be in the main example script). In this case, find_voids should return the list of maximal spheres and the list of holes.

QuiteAFoxtrot commented 3 years ago

Alternately, kadglass and I discussed including all the auxiliary functionality (except I/O) into a new interface to represent VoidFinder algorithmically as a discrete class or function - while vast.voidfinder contains all the pieces of VoidFinder, it may be worth having a single entry point that represents the collection of scientifically necessary pieces to use the algorithm as described in the literature (this is still possible today just isn't neatly wrapped with a bow on top). This would essentially be the giant master interface I described above, though if it has a well understood or scientifically relevant purpose I would support maintaining that despite the somewhat clunky code design.

If we go this route we need to ensure in the docs, that both the intended master interface is documented well and the important sub functions (hole_finder, combine_holes, etc) are described as top-level utilities with clear APIs.