Closed LevN0 closed 3 months ago
@cmillion @m-stclair
I discovered this issue while looking at this text in pdr
.
The docstring for the to_dict
method. It also includes e.g. a cast_values
argument.
Sorry, missed your tag.
We definitely missed the existence of skip_attributes
and will probably rewrite in order to set it True
.
My knee-jerk impulse is to say that to_dict
should not by default discard XML attributes. There is certainly the argument that it's more user-friendly -- if a user writes:
coordinate = label['Product_Observational']['Observation_Area']['Discipline_Area']['cart:Cartography']['cart:Spatial_Domain']['cart:Bounding_Coordinates']['cart:west_bounding_coordinate']
they probably expect to get str
or float
, not an OrderedDict
with the actual coordinate as the value of its '_text' key.
On the other hand, there is also the argument that -- at least I think this is true -- every PDS4 attribute that allows the use of XML attributes also mandates the use of XML attributes, so you can always know the expected type.
Dear all,
Regarding the issue at hand, the discrepancy between the set value for the skip_attributes option in the code and the reported default option documentation, I am of the opinion that the code value should match the reported documentation first and foremost.
Considering the larger question, i.e. how to deal with attributes, I am of a more radical opinion and although several PDS4 dictionaries do not include labels requiring attributes, I would, as a user, prefer to be confronted to a single datatype rather alternating system. Therefore, I would favor a solution in which the user would have an datatype listing value, unit for each and any label.
@Fenreg
To clarify, you would prefer,
<Table_Character>
<offset unit="byte">0</offset>
<records>76</records>
...
</Table_Character>
Translate to,
{
"Table_Character": {
"offset": {
"@unit": "byte",
"_text": "0"
},
"records": {
"_text": "76"
}
}
}
Did I understand correctly?
I understand @fenreg as meaning that this snippet should translate to:
"Table_Character": {
"offset": {
"@unit": "byte",
"_text": "0"
},
"records": {
"@unit": None,
"_text": "76"
}
in order to create complete consistency between expected keys of mappings associated with all PDS4 attributes.
For complete consistency, however, a solution this should include all legal XML attributes associated with any PDS4 attribute -- for instance, both "offset" and "records" above -- and maybe even "Table_Character" -- should also have "@nil": None
and "@nilReason": None
items, in addition to many others.
I am concerned that a solution like this will make parsed labels difficult to read and require additional filtering code for users who wish to translate these Python objects back into XML.
I do apologize, @LevN0, for not answering sooner and I do thank you @m-stclair for highlighting the logic of my previous answer.
This question that you raise, @m-stclair, is actually the one I am dealing with presently, as I have to create ancillary data to science observations and need to copy general information relative to these observations back into the main PDS4 dictionaries of their lblx files. I have implemented the corrective to ensure the lecture of all attributes as discussed above, and in the end, either way works in my view, i.e. setting an exception for a specific label, or - as I did prior to finding out about skip_attributes - having a "to_dict" method to include a "unit" key for each and every label.
All the best,
The one thing I definitely will not do is add attributes not in the label into the result. This would require having any dictionary at hand, which the code does not and will not. The most data that can be included by PDS4 Tools here is what is present in the label itself.
Thus the options for label.to_dict(..., skip_attributes=False)
for,
<Table_Character>
<offset unit="byte">0</offset>
<records>76</records>
...
</Table_Character>
are either,
{
"Table_Character": {
"offset": {
"@unit": "byte",
"_text": "0"
},
"records": {
"_text": "76"
}
}
}
or
{
"Table_Character": {
"offset": {
"@unit": "byte",
"_text": "0"
},
"records": "76"
}
}
On consideration, my vote is:
skip_attributes=True
by default. For most applications, the XML attributes in PDS4 labels are not important.skip_attributes=False
, always make the values of keys corresponding to PDS4 attributes dicts
, whether or not they have XML attributes. This is basically because type consistency is convenient.There are basically no options here that are not annoying compromises: XML elements simply do not cleanly translate to idiomatic Python Mappings
. XML is not JSON. If a user needs perfectly reliable access to all XML affordances, they should use the Label
object itself , or supplement pds4-tools
with a high-level XML manipulation toolchain like bs4
+ lxml
.
I take it back: I vote don't make every value corresponding to a PDS4 attribute a dict
when skip_attributes=False
. This might have been a good data design decision at some point, but it will be a catastrophically breaking change for many pieces of downstream software, which is a high cost for a tweaky little marginal improvement.
The Label class provides a method to convert the label from Label (which is effectively a subclass of ElementTree) to a dictionary, namely
Label.to_dict(...)
.The method includes a number of optional arguments.
One of the optional arguments is
skip_attributes
. The effect of this argument is as follows:Original XML,
Converted to dict via
Label.to_dict(skip_attributes=True)
,Converted to dict via
Label.to_dict(skip_attributes=False)
,The help string for this argument states "If True, skips adding attributes from XML. Defaults to False." However, despite this text, the argument is actually set to True by default, i.e. attributes are skipped when converting to dict.
Need to investigate whether the argument should be changed to False by default as the help string says, or whether the help string should be adjusted.