desihub / desitarget

DESI Targeting
BSD 3-Clause "New" or "Revised" License
18 stars 23 forks source link

Add an option to only read a subset of columns from the HEALPixel-split MTLs #805

Closed geordie666 closed 11 months ago

geordie666 commented 1 year ago

This PR adds an option in desitarget.io.read_mtl_in_hp() to facilitate reading a subset of columns from an MTL rather than the entire ledger. This should address #802.

This PR also addresses some deprecation warnings, which should at least partially address #804.

coveralls commented 1 year ago

Coverage Status

coverage: 56.25% (-0.04%) from 56.291% when pulling 967b737652a1dbda719566cba4c957e83dbff1b4 on ADM-cols-mtl-hp into dd8ba1b0f1de5a4b34ce772fdb2b5b016d7b5b9b on main.

geordie666 commented 1 year ago

@jalasker: Before I merge this, could you check that you can indeed successfully use the new columns kwarg in desitarget.io.read_mtl_in_hp() as requested?

jalasker commented 12 months ago

I tested this function and there is some slightly unexpected (to me) behavior.

When you are requesting RA and Dec as columns, the function works properly. However, if you do not provide RA and Dec as columns, there is a step at the end of the function (https://github.com/desihub/desitarget/blob/ADM-cols-mtl-hp/py/desitarget/io.py#L3238) which "restrict the targets to the actual requested HEALPixels" and which requires RA and Dec to be present.

The function will raise a ValueError that there is no RA field in the event that RA/Dec are not listed as columns.

I'm assuming this line is so you can provide healpixels with a different size than the ones the MTLs are partitioned into natively, since if the nside = 32, it shouldn't be possible for any trimming to change anything.

A couple of changes which I think might work here in a rough order based on how intuitive they'd be for me to use combined with what I think would be easiest to implement.

  1. only evaluate the is_in_hp function if nside != filenside. This avoids the problem entirely if you are reading in using healpixels/pixlist simply as an alias for which MTL file you want to read in.
  2. Check if "columns" contains RA/Dec and if not (2a) add them to the list of columns or (2b) raise the exception before doing any file loading/processing. (EDIT: It looks like you're already doing 2a in read_targets_in_hp: https://github.com/desihub/desitarget/blob/ADM-cols-mtl-hp/py/desitarget/io.py#L3330)
  3. Combine 1 and 2 by only doing the check in 2 if nside != filenside
  4. Store RA and Dec as separate arrays/tables from anything that will be returned and evaluate is_in_hp regardless of the considerations above.
  5. Number 4, but only do that if RA/Dec aren't in columns
jalasker commented 12 months ago

One more semi-related comment: the 'columns' argument can now be propagated through "read_targets_in_hp" to "read_mtl_in_hp" when mtl == True.

geordie666 commented 12 months ago

Thanks for the careful and thorough review @jalasker. I chose your option (2a) and always add the RA, DEC columns even if they aren't requested in the columns keyword. I think this is defensible given that the coordinates are a crucial aspect of working with HEALPixels. The docstring of read_mtl_in_hp() also now documents this behavior.

Please run your check again and let me know if you want further updates.

jalasker commented 11 months ago

Works as expected and ready to merge.