CCSDSPy / ccsdspy

I/O interface and utilities for CCSDS binary spacecraft data in Python. Library used in flight missions at NASA, NOAA, and SWRI
https://ccsdspy.org
BSD 3-Clause "New" or "Revised" License
74 stars 18 forks source link

Add support for loading variable length packets from CSV #115

Closed jmbhughes closed 4 months ago

jmbhughes commented 4 months ago

When completing #114 , I noticed that you cannot load variable length packets from a CSV. There is no way to do the reference linking or specifying an expanding field. PUNCH would like to load variable length packets from CSV. I'm happy to work on this, but I'd like guidance on how you want this structured. At the moment, the loading CSV is in the _BasePacket class. This change would make the loading different for fixed and variable length.

ddasilva commented 4 months ago

Thanks for pulling on this thread!

I was thinking of a couple ways this could work:

  1. Have the data_type column support values like uint8(*) (expanding) or uint8(other_field) (reference linking) or uint8(3,3) (3x3 array)
  2. Add a new CSV column for array_shape, which maps directly to the keyword argument

Option 1 seems maybe more readable when skimming the file, but Option 2 is easier for code to work with.

What do you think?

jmbhughes commented 4 months ago

I can see pros and cons with both. Option 1 does not require existing CSV files to be modified for anyone already using this feature. Option 2 is simpler to write (both code on our side to process and the CSV on the user side) and understand, but it feels less elegant to have empty entries for PacketFields. I like the idea of keeping the data type and array shape columns separate and making them directly correspond to the code keywords.

If there is already wide adoption of the CSV format, I would lean towards Option 1 since it doesn't break anything for users. However, I think I like Option 2 if the CSV loading feature is not currently widely used. Do you know if the CSV loading is currently being used by any mission?

If we go with Option 1, I wonder if it might be clearer to just literally write uint8(expand) so one doesn't have to remember the mapping between * and expanding... however said mapping does feel pretty obvious to me. So I'm not strongly committed to that opinion.

ddasilva commented 4 months ago

Any thoughts @ehsteve ? I think that some missions are using the CSV format, but we can do something like

from_file(file, version="v2") and change the default the next major CCSDSPy release

ehsteve commented 4 months ago

@ddasilva thanks for considering my opinion! I'm much more in favor of option 1 both because it does not break existing use of csv but also because it feels right to add new data types that expand.

Also I would use * and not the word 'expand' just in case someone wants to call one of their fields 'expand'. It also makes the code for parsing clearer.

I am really not a fan of something like from_file(file, version="v2") to ever be used because then someone would have to pay attention to version numbers of things and we'd have to support code to support multiple versions of csv files which would be a pain (and also hard to document).

ddasilva commented 4 months ago

Great! I'll go with @ehsteve's suggestion. I think for array order we can do uint(8, 8; F) to e.g. specify Fortran order, where what's after the semicolon is optional and uint(8, 8) uses the default array order.

Regarding uint(*) vs uint(expand), I think we should stick with expand, because that's what the API does. I think having two parallel vocabularies to describe things is going to get confusing.

Are there any other missing features of CSV formats what we haven't covered?

ddasilva commented 4 months ago

@tloubrieu-jpl feel free to offer your thoughts if you are interested

ehsteve commented 4 months ago

Do we need to support specifying Fortran order? The array can be easily flipped after the fact with the numpy array, right?

ddasilva commented 4 months ago

Probably not tbh. We can wait until someone complains to reconsider. Does this cover everything @jmbhughes ?

jmbhughes commented 4 months ago

Yes @ddasilva this gives me enough to get something started.

jmbhughes commented 4 months ago

Closing since the branch was merged.