Closed sschmaus closed 9 months ago
Thank you for the bug report. Funny to us because we work with this dataset constantly, but apparently never cared about the values with units.
Fixed in 67088ce1801d5d16290350973c179336bd2a0886.
This fix will be included in the next pdr
release, which will likely be within the next few weeks. If you need access to this fix urgently, please install from source and use the develop
branch.
Aren't you using PVL for reading PDS3 labels? If not, why not? I'm afraid we might run into problems there in the future, because many people do use it, and now it could be that pdr provides one value and pvl another one. Note that I don't know if pvl reads this correctly or not, I'm just worried that there exist two different readers for it now apparently, and I do know that MASA pays attention to pvl because officials asked the plpy team what its maintenance status is.
No. Although we offer hooks for parsing labels with pvl
,
pvl
is not part of pdr
's metadata-reading workflow.
This is basically because pvl
has fundamentally different
goals. It prioritizes reading PVL labels completely and
comprehensively, in order to provide formally correct
Python representations of every type of PVL object and feature.
pdr
, by contrast, primarily wants to ingest the metadata
it needs to correctly interpret observational data, and to
do so in a rapid, fault-tolerant way.
We originally did use pvl
. However, we want pdr to be usable
in performance-sensitive pipelines that need to deal with large
numbers of products.
We removed pvl
from pdr
's workflows because its completeness presented
serious performance concerns, to the point that we would otherwise
have become unable to use pdr
in our own performance-sensitive
applications as they scaled (our signal use case was, ironically,
ZCAM data processing).
pdr
's current PDS3 label parser is roughly 2 orders of magnitude
faster than pvl
for most products -- an even larger relative difference on
longer labels. We think that most planetary processing pipelines
care more about things like quickly figuring out how to read array data
than things like automatically interpreting all the datetimes
in a label. In metadata-only applications -- say, summary statistics
on a large archive -- this can be the difference between minutes or days
for gathering the specific metadata values you need.
This is not to say that there are no applications that might care
about automatically interpreting all the datetimes in a label, and pvl
is very good at that kind of thing, which is why we retain a hook for it.
Another way to put this might be that we do not really think of the
pdr.Metadata
object as the full and complete representation of the label,
but rather a distillation of and interface to its semantic content. The LABEL
object is the representation of the label. You should not have to load the LABEL
as a complete object to read the ARRAY any more than you should have to
load the ARRAY to read the LABEL.
Thanks for the quick fix!
I'm initializing Mars2020 data from the embedded PDS3 label with pdr v1.0.4.
Here is the PDS3 label:
When trying to access values with units and exponents like
pdr.metadata['DERIVED_IMAGE_PARMS']['RADIANCE_SCALING_FACTOR']
pdr outputs the following: {'value': 1.339256528, 'units': 'W/M**2/NM/SR'}`PDR omits the exponent, the correct value from the label is
RADIANCE_SCALING_FACTOR = 1.339256528e-06 <W/M**2/NM/SR>
I didn't see the problem with unitless values like
pdr.metadata['IMAGE']['SCALING_FACTOR']