cr / hx870

Python tools for the Standard Horizon HX870
GNU General Public License v3.0
21 stars 5 forks source link

Minor JSON dump format issue for long GPS logs #36

Open cr opened 1 year ago

cr commented 1 year ago

The LOCUS GPS log can contain multiple 4kB blocks, but this was unknown when the JSON log dump format was conceived. Consequently, fixing the log truncation bug in https://github.com/cr/hx870/pull/35 resulted in a minor regression in that area, specifically that dumps of GPS logs with many entries spanning multiple blocks do not fully represent the binary in-memory format any longer.

At the moment, no workflows in the code rely on this requirement, so this issue is merely documenting that the format's bijectivity promise is broken for the time being.

mbof commented 1 year ago

Here's a few options that I can think about:

  1. Drop the header altogether, as it is encoding information that mostly makes sense for the serial transmission (e.g. a checksum) and is unlikely to be of interest to the consumer.
  2. Add all the section headers, along with the point ranges that they each refer to, i.e. something like
    sectors: [
      { header: "0100...", "length": 201 },
      { header: "0200...", "length": 201 },
      { header: "0300...", "length": 201 },
      { header: "0400...", "length": 123 }
    ]
  3. Same as 2, plus add a decoding of the known fields in the header within each sector.
  4. Structure the whole JSON file as a series of sectors, each containing a header and a series of points. I don't think this is the best as it would likely require significant rewrite from any users of this format, and most consumers would likely prefer a single array of points anyways.

All of these changes introduce potentially consumer-breaking changes, so if we're worried about that, we can keep a command-line option to restore previous behavior. Or not, and let people who need the old behavior back sync to an earlier commit and open an issue.

My proposal for now is option 1 with no command-line option to control it. It's the simplest given what I know about user need (i.e. not much, only my speculation), and if people manifest themselves as needing something like option 2, 3 or 4, we can implement it then. Alternatively, option 2 is simple enough for me to take on, but I'd just worry a bit about adding code that nobody really needs, which makes it difficult to assess correctness.

Thoughts?

cr commented 1 year ago

I am leaning towards option 1. As far as I can see not much "need" for all that header data has materialized so far, so you're free to pick and choose what works best for you. It'll be relatively easy to add it again should we learn otherwise.