Options to skip message headers between packets

tloubrieu-jpl commented 1 year ago

For Europa-Clipper test files, we're having packets separated by specific pieces of binary stream.

See spreadsheet https://docs.google.com/spreadsheets/d/1YX-_zw9tdEwkYxdQ5IA8tClHnzwaNehGvE04JfFdY2A/edit#gid=0

We have 3 cases:

standard packet sequence
fixed length header
packet start marker

I propose:

case 2: skip a fixed number of bits in between packets
case 3: specify the start marker with as a callable taking as parameters n bits of the stream. The callable will be callable at every position in the stream. A packet will start a the position after whenever the callable return True (in our case the function works on 4 bytes and returns seq[1] == 0xF0 and seq[0] == seq[2] == seq[3] != 0x00)

I am attaching 2 test files for case 2 and 3: case-2.bin.zip case-3.bin.zip

ddasilva commented 1 year ago

Thanks for sharing this. We can definitely add a packet_spacing=N option to load() and related functions which will skip N bytes between packets to address case 2. Do we need to support skipping a number of bits that isn't divisible by 8? I believe we discussed supporting arbitrary spacing between packets to address outer frames attached to the packets, which generally come in sets of complete bytes.

To address case 3, I will think about the best way to support this. How are you doing this currently? We could maybe implement this with a keyword argument such as:



def locator_fun(byte_window):
    return byte_window[1] == 0xF0 and byte_window[0] != 0 and byte_window[2] != 0 and byte_window[2] != 0

pkt = VariableLength([...])
result = pkt.load('telemetry.bin', packet_locator=PacketLocator(locator_fun, window=4))

tloubrieu-jpl commented 1 year ago

Thanks @ddasilva for your quick feedback.

We can do the packet_spacing in bytes, in our case it is 32 bits, so that will work in bytes.

I like your proposal for the packet_locator.

Thanks again,

ddasilva commented 1 year ago

Adding a note here for myself. I think in addition to packet_spacing=N we should support skip_extra_headers=N. The difference is an extra header is before each packet (so, the file starts with one), where as a spacing is between each packet (in which case, the file doesn't start with one).

ehsteve commented 1 year ago

@tloubrieu-jpl is this compatible with the ccsds standard? Are those messages also ccsds packets?

Like my comment in the other issue, I'd like us to maintain the ability to document packets non programmatically. If these messages always follow another packet, could they just defined as extra fields in the previous packet? Or we could define a new SPACER field.

tloubrieu-jpl commented 1 year ago

@ehsteve , I don't think that is strictly speaking CCSDS. This is how our Science Data System receives the CCSDS packets embedded in other messages following a specific syntax. And specifically for test datasets, the source of the CCSDS packets differ and so does their wrapping.

Now that I think of it more, that would be an acceptable answer to say that this is outside the scope of CCSDSpy and that a preprocessing need to be done to strip the packets off their non CCSDS wrappers.

ehsteve commented 12 months ago

Thanks @tloubrieu-jpl. I wonder what @ddasilva thinks of this CCSDS strictness? I worry that if we start allowing for all kinds of packet shenanigans (1) this code base will become a giant mess as binary data can be organized in an infinite amount of ways and (2) we weaken the CCSDS standard. I actually wish the standard was more developed and stricter in some ways but I understand that others may disagree!

ddasilva commented 12 months ago

I'll have to think about this more deeply, but for this ticket I think we can conclude that we'll not implement packet spacing skipping right now for this ticket since @tloubrieu-jpl seems satisfied. But here are my thoughts.

On one hand, I think about the pandas pd.read_csv() function, which is incredibly useful and has an option for almost everything I'd ever need. It's probably a mess to maintain, but from my point of view, if I get data in a weird CSV format, it's most probably a lost cause to try to get the upstream provider to "fix" their weird CSV format. That's where pd.read_csv() comes in to save me. Pretty much everything pd.read_csv() does I could re-implement myself on a case by case basis, but in the end it's a matter of whether I spend 10 seconds or an hour on the same problem.

One the other hand, the code has the ability to become very complex if we implement a lot of different "helpful options" as keyword arguments to a single function. We might be able to manage this complexity by making functions like strip_spacing() that work together like unix command line tools. We would then be requiring someone to call pkt.load(strip_spacing(bytes_io, 10)).

At the end of the day, what I think this library does for most people is make it so they have to think less about bit manipulation and more about the problem they're trying to solve. If something like strip_spacing(bytes_io, 10) saves a lot of people time and avoids context switching to bit manipulation, I think it makes sense for the project. I think it's pretty unlikely that we'll be feeding the beast of poorly prepared CCSDS data, because in reality, that ship tends to sail for other reasons.

CCSDSPy / ccsdspy

Options to skip message headers between packets #74