MillionConcepts / pdr

[P]lanetary [D]ata [R]eader - A single function to read all Planetary Data System (PDS) data into Python
Other
60 stars 6 forks source link

Reading an image with attached label instead reads detached label #56

Closed msbentley closed 3 months ago

msbentley commented 3 months ago

Mars Express HRSC images (unfortunately) comes with two versions in the data directory:

They share the same name, so that we have in one folder

HJ592_0000_S23.IMG
HJ592_0000_S23.JP2
HJ592_0000_S23.LBL

see for example: http://archives.esac.esa.int/psa/ftp/MARS-EXPRESS/HRSC/MEX-M-HRSC-5-REFDR-MAPPROJECTED-V3.0/DATA/J592/

Currently if you point pdr to the .IMG file, it seems to pick up the .LBL with the same filename before checking for an attached label, so explitly trying to open e.g. HJ592_0000_S23.IMG actually opens HJ592_0000_S23.LBL.

I understand the situation here is a bit messy ;-) but it would be great if a future version could check first for the attached label, or similar?

m-stclair commented 3 months ago

We don't prefer attached labels by default because there are many data files that have attached PVL headers that aren't their "real" labels, but rather ancillary metadata or legacy labels etc., and it is more common for a detached label to be the right pick. The diversity of file naming conventions means that pdr's automatic label file/data file association is not guaranteed to work 100% of the time.

Fortunately, there are a couple of easy mechanisms for cases like this:

  1. In the specific case here -- you know that a data file has an attached label and you don't want to check for a detached label -- you can open it with pdr.fastread(data_fn).
  2. More generally, you can specify a product's label file by passing the label_fn argument to pdr.read() -- in this case pdr.read(data_fn, label_fn=data_fn). This is also useful for cases in which a data file's filename stem does not match the filename stem of its detached label, or the label file is elsewhere on the filesystem, or there is a detached PDS4 label but you want to specify use of the detached PDS3 label, etc.

Note that pdr.fastread(data_fn, **kwargs) is just an alias for pdr.read(data_fn, label_fn=data_fn, skip_existence_check=True, **kwargs). In addition to your use case, it is helpful if a file has attached and detached labels that are both valid, but in different formats, and you want PDR to interpret it using its attached label. This includes FITS files; it causes PDR to interpret them using only their FITS headers. Finally, it improves performance when you are running a program that reads a large number of products, particularly on a slow filesystem (hence the name).

msbentley commented 3 months ago

Great, many thanks @m-stclair we'll go ahead using those hints!