Small-Bodies-Node / pds4_tools

Python package to read and display NASA PDS4 data.
17 stars 13 forks source link

Table_Delimited crashes on null values in ASCII_Boolean fields #36

Closed LandingEllipse closed 3 years ago

LandingEllipse commented 3 years ago

Issue

When reading Table_Delimited structures, records containing empty values for numeric fields like ASCII_Integer are masked as expected. However, this is not the case for ASCII_Boolean fields, where instead a ValueError is thrown ("invalid literal for int() with base 10: b''").

The only hint I've found on the "legality" of empty values in boolean fields comes from the Standards Reference section 4C.1 (Delimiter Separated Value Format Description), which says that:

A field may be empty. The interpretation of an empty field will be application and data type dependent.

While this does not directly imply that PDS4 Tools must support empty booleans in particular, I think the fact that the tool already interprets empty numeric fields via masking makes it reasonable for end users to assume that this behaviour would extend also to booleans.

Potential solution

Since boolean fields are also represented as ndarrays, I've had success with simply extending the existing masking behaviour, i.e. https://github.com/Small-Bodies-Node/pds4_tools/blob/8ada764ae1ae102d1a17ac4820cb799b87d7041a/pds4_tools/reader/data_types.py#L472-L476 ...to the handling of boolean fields. I've submitted #37 as a starting point for a solution.

I'm looking forward to hearing your thoughts on this, in particular if you can think of any caveats or reasons not to support masked booleans.

Many thanks for your work on PDS4 Tools!

LevN0 commented 3 years ago

Empty fields (of any data type) are allowed in Table_Delimited structures per the PDS4 Standard, but not Table_Character. If the code is crashing on read-in of empty boolean fields with-in Table_Delimited, this is a bug. I will investigate and get back to you.