astropy / astropy-healpix

BSD-licensed HEALPix for Astropy - maintained by @astrofrog and @lpsinger
https://astropy-healpix.readthedocs.io
BSD 3-Clause "New" or "Revised" License
50 stars 22 forks source link

High-level API; Remove HEALPix class? #98

Open cdeil opened 6 years ago

cdeil commented 6 years ago

Currently we offer the same functionality via three interfaces (see API).

  1. Functions that use Numpy arrays or Quantity for angles. E.g. lonlat_to_healpix(lon, lat, nside, return_offsets=False, order='ring')
  2. A HEALPix class with (nside, order, frame) data members. One-line methods re-expose functions.
  3. Functions in astropy_healpix.healpy. Drop-in replacements for healpy. To help transition code and with internal testing.

I would like us to discuss and make a decision next week at the workshop on the following questions:

  1. Remove the HEALPix class?
  2. Hide astropy.healpix.healpy from the docs?

Remove HEALPix class?

The HEALPix class (see high_level.py contains nside, order and frame properties, as well as ~ 10 one-line methods re-exposing the function API.

The benefit of the class is that it's very nice for users (see docs):

>>> from astropy_healpix import HEALPix
>>> hp = HEALPix(nside=16, order='nested')
>>> lon, lat = hp.healpix_to_lonlat([1, 442, 2200])
>>> hp.boundaries_lonlat([120], step=1)

A single-class API is nice to import and remember, and only having to pass some parameters once instead of to multiple function calls is convenient.

However, offering the class also has drawbacks:

I'm +1 to remove the class, and am volunteering to make a PR removing it and rewriting the docs to use the functions.


Hide astropy.healpix.healpy from the docs?

Another question that we should decide before making a PR for astropy.healpix for the Astropy core repo is whether astropy.healpix.healpy should be shown in the public API and docs. At the moment it appears like this: https://astropy-healpix.readthedocs.io/en/latest/healpy_compat.html

I think that's fine, we should keep it as a compatibility and transition layer and for testing, and also mentioning it in the docs like that is OK. But if others think it would be better to hide it more from the docs, or if that is the preference of the healpy maintainers, that seems OK to me as well.


@astrofrog @lpsinger @dstndstn all - please comment!

astrofrog commented 6 years ago

I don't have much time to comment on this today, but I'm 👎 on removing the class. The intent when writing it was that one day we could implement HEALPix.from_header so that users don't have to manually parse a header and pass the arguments each time to the functions, and it does provide benefits for 'traditional' HEALPix projections.

The usage of nuniq indexing is becoming more common, and I think we should support arrays for nside (see #66 #91). This makes it awkward to pass nside in init, it should really be moved to the method calls, leaving only order and frame in init.

This may be true but there are still many files out there that just use the standard HEALPix approach with scalar nside so I don't think it's worth removing the high-level class yet.

This also means there is no efficiency benefit of the class, i.e. to pre-compute something for a given nside. We could of course only support scalar nside in the class, and only allow array nside in the functions, like @lpsinger did in #71 .

What about simply making it so that it's optional to specify nside when initializing HEALPix and in that case it can be passed during method calls?

The zen of Python says "there should be one way to do it", and adding the class is just offering a second API to do the same thing, splitting the docs and user base.

Most of the docs actually use the high-level API at the moment. If we want to simplify the API, my personal preference would be to keep only the classes in the public API and hide the individual functions. Most of the docs actually mention the high-level class, it's just the API docs that show the functions so we aren't really splitting the docs.

In addition to making it possible to easily instantiate a HEALPix from a header, the other benefit to me to having a high-level class is that conceptually the class represents a specific tessellation - in a sense this is the same as having a WCS class rather than having to call each WCS method by passing all the WCS parameters each time.

My personal preference is to keep the class approach as the primary way to use the package, and allow nside to optionally be passed during method calls and not during initialization. We can keep the individual functions in the API docs but not mention them anywhere else. If we have to get rid of something I would suggest hiding the healpy functions from the docs. We originally only added that to make it easier to migrate, but I find our main API to be nice and clean and would rather people used that. So the bottom line is that I'd rather get rid of the healpy compatibility layer than our high-level class.

Of course this is just my personal opinion and happy to go with the majority decision.

astrofrog commented 6 years ago

Another benefit of the high-level class is integration with the SkyCoord class, which doesn't exist in the lower-level API.

In any case, having thought about this a bit more, I do agree that the fewer APIs we have to maintain, the better. One option therefore would be to also get rid of the low level API and keep only the high-level API. Just an idea!

cdeil commented 6 years ago

I think we should only offer one API, the functions or the class.

If we choose the class, then I think there's a few options to do it. One could keep mostly as-is, maybe more the nside or even frame from __init__ to the methods that use it. When using array-valued nside it's weird to pass it in __init__ and the other arguments in the methods.

Another option could be to take more arguments in __init__ but default them all to None, and the same in the methods. Then it's up to the user to only pass them once in __init__ or in each method call. This would be most flexible in terms of usage, but overall I'm not sure flexibility is a win here. I think most users will only call very few methods (usually one or two) and passing a few parameters two times isn't bad, and doesn't offset the complexity / cost of offering mutliple ways to do things. But in the end it's a matter of taste.

This API question would certainly benefit from more user feedback; we could even post it on astropy-dev if we don't get many opinions here.

astrofrog commented 6 years ago

Another option could be to take more arguments in init but default them all to None, and the same in the methods. Then it's up to the user to only pass them once in init or in each method call. This would be most flexible in terms of usage, but overall I'm not sure flexibility is a win here. I think most users will only call very few methods (usually one or two) and passing a few parameters two times isn't bad, and doesn't offset the complexity / cost of offering mutliple ways to do things. But in the end it's a matter of taste.

An intermediate solution is to allow that flexibility only for nside - that is, order/scheme and frame should still be specified in __init__, and we would allow array nside values to be passed during the method call.

I agree that this would likely benefit from user opinions.

lpsinger commented 5 years ago

An intermediate solution is to allow that flexibility only for nside - that is, order/scheme and frame should still be specified in __init__, and we would allow array nside values to be passed during the method call.

I like this option; I'd like to try it out. I am working on some tutorial material on the NUNIQ scheme for our LIGO/Virgo Public Alerts User Guide; this would be a good opportunity to try this out.

astrofrog commented 3 years ago

@lpsinger - did you have a chance to try this out? Or shall I just close this as there haven't really been any other complaints about the API and it's not really worth changing it?

lpsinger commented 3 years ago

No, I haven't tried it out yet. I just got very busy, that's all.

My hot take is:

@israelmcmc has recently done some work on an object-oriented interface for multi-resolution maps. He may have some thoughts to offer.

israelmcmc commented 3 years ago

As @lpsinger mentioned, I recently implemented an object-oriented wrapper for healpy with support for multi-resolution maps. The documentation https://healpixmap.readthedocs.io includes a quick-start tutorial where you can see the benefits of this approach.

As pointed out by @cdeil, having a class for a single-resolution map is convenient for the user, but rather than a drawback, it is even more advantageous in the case of multi-resolution maps. The idea is that the user can write code without thinking about the underlaying grid, whether the map is multi or single-resolution, or its ordering scheme and nside. The objects contain all the information needed and know what to do. For example, you can do operations between maps with different underlaying grids using an intuitive syntax, e.g. m1*m2.

A multi-resolution map object contains an array with the explicit NUNIQ number for each of the pixels. However, the maps behave as is if they were rasterized to single-resolution maps before an operation (they are not though). For this to work the object needs to have a flag property called density. This tells the different routines how to split or combine pixels when needed. If False, the map is considered "histogram-like", that is, the value of the parent pixel will be split equal ways among the 4 child pixels of higher order; if True, then all child pixels will have the same value as the parent pixel.

Another benefit of the object-oriented approach for multi-resolution maps is that you do have an efficiency benefit by pre-computing the rangesets for pixels and sorting them. These are the list of equivalent pixels in a single resolution map that a given UNIQ pixel would have, and are used to do the operations efficiently. I'm actually not caching these since I haven't decided if is better to save CPU or memory, but it can be done.

With this approach, I think the "core" procedural interface can be hidden like @astrofrog suggested. I did find it useful though to expose two separate classes: HealpixBase and HealpixMap (akin to HEALPix C++). The former has all the pixelization related functions --e.g. pix2ang(), get_interp_weights()--, while the latter implements all the methods related to the contents of the maps --.e.g [], *, + operators, get_interp_val().

If people decide to go with this approach for astropy-healpix I'll be happy to help. Currently the only dependency for the library I wrote is healpy, and I imagine it can be replaced with the methods you've implemented already.