Problem reading ALICE data (incorrect data offset?)

MillionConcepts / pdr

[P]lanetary [D]ata [R]eader - A single function to read all Planetary Data System (PDS) data into Python

Other

60 stars 6 forks source link

Problem reading ALICE data (incorrect data offset?) #8

Closed msbentley closed 3 years ago

msbentley commented 3 years ago

When trying to open the primary IMAGE array in this product:

https://pds-smallbodies.astro.umd.edu/holdings/ro-c_cal-alice-4-ext3-v1.0/data/2016/09/ra_160930102016_hisb_lin.lbl https://pds-smallbodies.astro.umd.edu/holdings/ro-c_cal-alice-4-ext3-v1.0/data/2016/09/ra_160930102016_hisb_lin.fit

I get all NaNs, but inspecting in e.g. fv everything looks fine. I'm not sure where the issue is - my guess was perhaps in the offset into the FITS file?

As far as I can see the data should be read from: ^IMAGE = ("RA_160930102016_HISB_LIN.FIT",18)

and the record length is: RECORD_BYTES = 2880 /* FITS standard record length */

so the start byte should be 17*2880 = 48960. But perhaps the issue is elsewhere!

cmillion commented 3 years ago

I think it's reading correctly for me. It's just nan around the edges. If I do np.nanmean(data.IMAGE) (which computes the mean of the array, excluding nan values) then I get 8.156786. And the image appears to have structure:

cmillion commented 3 years ago

It's not reading some of the sub-objects out completely (like COUNT_RATE_SERIES). It might also be assigning the same array as IMAGE to ERROR_IMAGE and CALIBRATION_IMAGE, because they all look identical; that seems wrong.

msbentley commented 3 years ago

Ahh yes, sorry for the hassle - I did a min/max, but forget to check how NaNs were handled in numpy, so:

In [8]: np.nanmin(test.IMAGE)
Out[8]: -180.0158

In [9]: np.nanmax(test.IMAGE)
Out[9]: 1652.9675

indeed are fine.

But yes, from the FITS file, indeed all 3 "images" are different, but read the same in PDR.

cmillion commented 3 years ago

Not a problem. I'll try to solve the issue where all of the images are the same.

cmillion commented 3 years ago

This will be weirdly difficult to fix. The names of the data objects in the label are not the same as the names of the data objects in the FITS file, so I can't straightforwardly map between them.

cmillion commented 3 years ago

I have an idea. Will need to refactor FITS handling a bit.

msbentley commented 3 years ago

Thanks @cmillion - to clarify for myself, does PDR try to read "identified" data formats like FITS using their own standard, rather than assuming plain PDS?

cmillion commented 3 years ago

Yes, sort of. If it's a FITS file, I try to use astropy's wrapper for fitsio to read it. However one of my design principles is that the PDS label is primary. This is explicitly true in PDS4, where FITS file headers don't necessarily even need to agree with the PDS4 .xml label. So I want to maintain the semantic connection between the objects as defined in the label and the data objects as returned by pdr. For the ALICE data, it looks like there's a 1-to-1 mapping, but they are named differently. I could of course just write a special case that handles ALICE, but this problem might well exist elsewhere in the archives, so I want to try to come up with something a little more general; although I don't think that I can avoid special case handling forever.

cmillion commented 3 years ago

I have used Levenshtein distance as a way to match the PDS3 object names with the FITS object names. This approach might well return the wrong match in some circumstances, in which case I will probably have to implement a special case exception. However it seems to work perfectly for the ALICE data. Please test.

cmillion commented 3 years ago

Note that that there is a new requirement: pip install python-Levenshtein

msbentley commented 3 years ago

Great, looks good - thanks!