Closed jmbhughes closed 4 months ago
Thanks for pulling on this thread!
I was thinking of a couple ways this could work:
uint8(*)
(expanding) or uint8(other_field)
(reference linking) or uint8(3,3)
(3x3 array)array_shape
, which maps directly to the keyword argumentOption 1 seems maybe more readable when skimming the file, but Option 2 is easier for code to work with.
What do you think?
I can see pros and cons with both. Option 1 does not require existing CSV files to be modified for anyone already using this feature. Option 2 is simpler to write (both code on our side to process and the CSV on the user side) and understand, but it feels less elegant to have empty entries for PacketField
s. I like the idea of keeping the data type and array shape columns separate and making them directly correspond to the code keywords.
If there is already wide adoption of the CSV format, I would lean towards Option 1 since it doesn't break anything for users. However, I think I like Option 2 if the CSV loading feature is not currently widely used. Do you know if the CSV loading is currently being used by any mission?
If we go with Option 1, I wonder if it might be clearer to just literally write uint8(expand)
so one doesn't have to remember the mapping between *
and expanding... however said mapping does feel pretty obvious to me. So I'm not strongly committed to that opinion.
Any thoughts @ehsteve ? I think that some missions are using the CSV format, but we can do something like
from_file(file, version="v2")
and change the default the next major CCSDSPy release
@ddasilva thanks for considering my opinion! I'm much more in favor of option 1 both because it does not break existing use of csv but also because it feels right to add new data types that expand.
Also I would use * and not the word 'expand' just in case someone wants to call one of their fields 'expand'. It also makes the code for parsing clearer.
I am really not a fan of something like from_file(file, version="v2")
to ever be used because then someone would have to pay attention to version numbers of things and we'd have to support code to support multiple versions of csv files which would be a pain (and also hard to document).
Great! I'll go with @ehsteve's suggestion. I think for array order we can do uint(8, 8; F)
to e.g. specify Fortran order, where what's after the semicolon is optional and uint(8, 8)
uses the default array order.
Regarding uint(*)
vs uint(expand)
, I think we should stick with expand
, because that's what the API does. I think having two parallel vocabularies to describe things is going to get confusing.
Are there any other missing features of CSV formats what we haven't covered?
@tloubrieu-jpl feel free to offer your thoughts if you are interested
Do we need to support specifying Fortran order? The array can be easily flipped after the fact with the numpy array, right?
Probably not tbh. We can wait until someone complains to reconsider. Does this cover everything @jmbhughes ?
Yes @ddasilva this gives me enough to get something started.
Closing since the branch was merged.
When completing #114 , I noticed that you cannot load variable length packets from a CSV. There is no way to do the reference linking or specifying an expanding field. PUNCH would like to load variable length packets from CSV. I'm happy to work on this, but I'd like guidance on how you want this structured. At the moment, the loading CSV is in the
_BasePacket
class. This change would make the loading different for fixed and variable length.