Small-Bodies-Node / pds4_tools

Python package to read and display NASA PDS4 data.
17 stars 12 forks source link

Colons stripped from table field names on read #68

Open msbentley opened 2 years ago

msbentley commented 2 years ago

I have a data product with columns defined like:

                <Field_Character>
                    <name>VFS1102L FREND:FRAM NUMD1</name>
                    <field_number>3</field_number>
                    <field_location unit="byte">47</field_location>
                    <data_type>ASCII_NonNegative_Integer</data_type>
                    <field_length unit="byte">10</field_length>
                    <description>FRAM NUM: Running counter, 0...65535</description>
                </Field_Character>

However when I read this table in pds4_tools, the colon seems to be stripped, e.g.

In [28]: struct = pds4_read('frd_raw_sc_d_20160406T000000-20160406T235959.xml')
In [30]: struct[0].data
<...snip...>
            dtype=(numpy.record, [('PUS_TIME_UTC', '<U27'), ('PUS_TIME', '<U17'), ('VFS1102L FREND_FRAM NUMD1', 'u1'), ('PACKET_COUNTER', 'i1'), ('VFS1105L FREND_DOS DATA1', 'O'), ('VFS1205L FREND_DOS DATA2', 'O'), ('VFS1305L FREND_DOS DATA3', 'O'), ('VFS1405L FREND_DOS DATA4', 'O'), ('VFS1505L FREND_DOS DATA5', 'O'), ('VFS1605L FREND_DOS DATA6', 'O'), ('VFS1705L FREND_DOS DATA7', 'O'), ('VFS1805L FREND_DOS DATA8', 'O'), ('VFS1905L FREND_DOS DATA9', 'O'), ('VFS2005L FREND_DOS DATA10', 'O')]))

and if I try to index by the correct name I get an error, e.g.

In [31]: struct[0].data['VFS1102L FREND:FRAM NUMD1']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [31], in <cell line: 1>()
----> 1 struct[0].data['VFS1102L FREND:FRAM NUMD1']

File ~/miniconda3/envs/bepi/lib/python3.10/site-packages/pds4_tools/reader/data.py:169, in PDS_ndarray.__getitem__(self, idx)
    155 def __getitem__(self, idx):
    156     """
    157     Parameters
    158     ----------
   (...)
    167         then the meta_data will be preserved for those fields or records.
    168     """
--> 169     obj = super(PDS_ndarray, self).__getitem__(idx)
    171     # For structured arrays, retrieve the correct meta_data portion if we are not obtaining all of the
    172     # fields
    173     if isinstance(obj, np.ndarray):

ValueError: no field of name VFS1102L FREND:FRAM NUMD1

but instead I have to access the field with an underscore:

struct[0].data['VFS1102L FREND_FRAM NUMD1']

Is this a bug, or intended? Currently, for various reasons, I'm getting the field names from the table manifest, where the values are correct, and then this fails when using the field name to access the data.

LevN0 commented 2 years ago

This apparently was intended behavior due to a issue in NumPy at the time the code was written. See method here.

I will look if the issue still exists. If not, I may consider adjusting the current behavior.

msbentley commented 2 years ago

OK, many thanks - I'll work around it on my side in the interim.

LevN0 commented 2 years ago

So far I could not replicate the NumPy issue going back pretty far, but I need to look further.