Closed msbentley closed 3 years ago
I think it's reading correctly for me. It's just nan
around the edges. If I do np.nanmean(data.IMAGE)
(which computes the mean of the array, excluding nan values) then I get 8.156786. And the image appears to have structure:
It's not reading some of the sub-objects out completely (like COUNT_RATE_SERIES). It might also be assigning the same array as IMAGE to ERROR_IMAGE and CALIBRATION_IMAGE, because they all look identical; that seems wrong.
Ahh yes, sorry for the hassle - I did a min/max, but forget to check how NaNs were handled in numpy, so:
In [8]: np.nanmin(test.IMAGE)
Out[8]: -180.0158
In [9]: np.nanmax(test.IMAGE)
Out[9]: 1652.9675
indeed are fine.
But yes, from the FITS file, indeed all 3 "images" are different, but read the same in PDR.
Not a problem. I'll try to solve the issue where all of the images are the same.
This will be weirdly difficult to fix. The names of the data objects in the label are not the same as the names of the data objects in the FITS file, so I can't straightforwardly map between them.
I have an idea. Will need to refactor FITS handling a bit.
Thanks @cmillion - to clarify for myself, does PDR try to read "identified" data formats like FITS using their own standard, rather than assuming plain PDS?
Yes, sort of. If it's a FITS file, I try to use astropy's wrapper for fitsio
to read it. However one of my design principles is that the PDS label is primary. This is explicitly true in PDS4, where FITS file headers don't necessarily even need to agree with the PDS4 .xml label. So I want to maintain the semantic connection between the objects as defined in the label and the data objects as returned by pdr
. For the ALICE data, it looks like there's a 1-to-1 mapping, but they are named differently. I could of course just write a special case that handles ALICE, but this problem might well exist elsewhere in the archives, so I want to try to come up with something a little more general; although I don't think that I can avoid special case handling forever.
I have used Levenshtein distance as a way to match the PDS3 object names with the FITS object names. This approach might well return the wrong match in some circumstances, in which case I will probably have to implement a special case exception. However it seems to work perfectly for the ALICE data. Please test.
Note that that there is a new requirement: pip install python-Levenshtein
Great, looks good - thanks!
When trying to open the primary IMAGE array in this product:
https://pds-smallbodies.astro.umd.edu/holdings/ro-c_cal-alice-4-ext3-v1.0/data/2016/09/ra_160930102016_hisb_lin.lbl https://pds-smallbodies.astro.umd.edu/holdings/ro-c_cal-alice-4-ext3-v1.0/data/2016/09/ra_160930102016_hisb_lin.fit
I get all NaNs, but inspecting in e.g. fv everything looks fine. I'm not sure where the issue is - my guess was perhaps in the offset into the FITS file?
As far as I can see the data should be read from:
^IMAGE = ("RA_160930102016_HISB_LIN.FIT",18)
and the record length is:
RECORD_BYTES = 2880 /* FITS standard record length */
so the start byte should be 17*2880 = 48960. But perhaps the issue is elsewhere!