bluerobotics / ping-viewer

Ping Viewer is an open-source application to view and record data from the Blue Robotics Ping Echosounder and Ping360 Scanning Sonar.
https://docs.bluerobotics.com/ping-viewer/
GNU General Public License v3.0
42 stars 39 forks source link

Process as generator; recover on error #944

Closed Eliot-Insight closed 3 years ago

Eliot-Insight commented 3 years ago
Eliot-Insight commented 3 years ago

Includes the bug fix from #943, so if this is merged 943 can be removed/ignored.

Eliot-Insight commented 3 years ago

Interesting that windows build failed - I haven't changed any of the actual ping-viewer code, just the example. I guess the master branch is currently failing the windows build too?

Eliot-Insight commented 3 years ago

Hi @Williangalvani,

In my testing I found that when errors were occurring it was because the array-length that it was reading in was a huge number that tried to read all the remaining bytes of the file into a single message, which used up the rest of the file. I provided a couple of example files where this occurs here (same as in my initial post). I'm unfortunately unsure how or why this was occurring in the first place, although those specific files were generated at a time where our communication was regularly dropping out because of some issues with our setup.

The idea with this recovery method is basically what I proposed in pseudocode in my bluerobotics forum post, where if it detects that a clearly incorrect array length has been read (data_array_length > max_data_array_length), it assumes that it's lost and tries to recover by finding the next available timestamp (recognised using regex), after which it parses that timestamp manually, and then reads the corresponding message normally, before returning to the standard read-timestamp, read-message process.

While the recovery process is intended (and seemingly works) for specifically the issue I was having, I'd imagine it should work for any time when bytes are randomly missing (although I haven't tested this), potentially at the cost of a few messages. If bytes get deleted in a file it'll at some point end up trying to read the next array length from a random position, which involves turning four random bytes into a 32-bit UINT, which will likely be long enough to trigger recovery mode, and if not would do so relatively soon afterwards as each incorrect length read will almost certainly take it to another incorrect location.

If your manual corruption of the files was at the bit level then there's a 7/8 chance it's damaged the byte-alignment, in which case I would expect it to fail to recover after that point, as regex operates on bytes and won't find a valid timestamp location. If you deleted some bytes and it didn't work then possibly they were from the file header, but if not then I'm afraid I'm not sure why it's not working. If you can confirm that it's not working after some random bytes deletion from say the middle of a file I'll try to have a look into it in my spare time over the next week or so. Since it's working for my issue I can't justify spending more work time on it unfortunately.

Eliot-Insight commented 3 years ago

I've never squashed commits before, but it seems to have worked. Hopefully that's what you were after :-)

Williangalvani commented 3 years ago

I've never squashed commits before, but it seems to have worked. Hopefully that's what you were after :-)

Yes, that looks correct!

@patrickelectric do you mind taking a look at this?

patrickelectric commented 3 years ago

Hi @Eliot-Insight, sorry for the delay! Be free to ping us if you desire to get it merged asap. Thank you very much for your contribution!