OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
91 stars 71 forks source link

Fixes `UnicodeDecodeError` for ES60 files [all tests ci] #1215

Closed praneethratna closed 8 months ago

praneethratna commented 8 months ago

Addresses #1195 by changing encoding to unicode_escape instead of using default encoding utf-8.

CC @leewujung

emiliom commented 8 months ago

I'll add "[all tests ci]" to the PR title and closing-reopening the PR to force all tests to be run, to ensure there are no negative consequences with other EK/ES data. As it stands, it looks like none of the tests were actually run in the CI.

codecov-commenter commented 8 months ago

Codecov Report

Merging #1215 (82b8693) into dev (499866f) will increase coverage by 0.31%. Report is 2 commits behind head on dev. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev    #1215      +/-   ##
==========================================
+ Coverage   83.13%   83.44%   +0.31%     
==========================================
  Files          63       64       +1     
  Lines        5710     5672      -38     
==========================================
- Hits         4747     4733      -14     
+ Misses        963      939      -24     
Flag Coverage Δ
unittests 83.44% <100.00%> (+0.31%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
echopype/convert/utils/ek_raw_parsers.py 55.38% <100.00%> (ø)

... and 7 files with indirect coverage changes

:mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

praneethratna commented 8 months ago

Ping @emiliom, should we add a new test case using the data file provided here - https://github.com/OSOceanAcoustics/echopype/issues/1195#issuecomment-1788674130?

leewujung commented 8 months ago

@praneethratna: thanks for looking in to this! I checked the example ES60 file and this happens in the spare0 field and with encoding="unicode_escape" it returns '7B\x00\x00\x90A'. With a regular EK60 file, the same spare0 field returns '\x00\x00\x00\x00\x00\x00'. I think this is fine since this field doesn't actually encode anything useful.

I am a little reluctant to add a 100MB file into the test data set. Let's ping the person who raised this issue to see if they could help provide a smaller file.

We can merge this once we decide whether we would add a smaller test data file into this (if such file exists). If not, we can add a comment that this error occurred for an ES60 file.

emiliom commented 8 months ago

@praneethratna I've gone ahead and uploaded the new small test file (https://github.com/OSOceanAcoustics/echopype/issues/1195#issuecomment-1805255054) to our test folder on Google Drive. I created a new top-level folder, es60. BTW, that really is a tiny file -- just 11KB! I haven't tried opening it.

I think you can go ahead and create a test for it. Thanks!

leewujung commented 8 months ago

@praneethratna: I've just verified that the 11KB file does contain the type of byte string that was not previously decoded. Please go ahead to add a test for this file under test_convert_ek60.py. Thank you!