CRIMAC-WP4-Machine-learning / CRIMAC-preprocessing

Preprocessing acoustic data from .raw to a gridded format
GNU Lesser General Public License v3.0
7 stars 6 forks source link

EK80 Raw files - "Something went wrong" #21

Closed kjetilhh closed 3 years ago

kjetilhh commented 3 years ago

This file - and the other in the directory linked to below fails during processing. https://oceaninsightscience.file.core.windows.net/hidata/cruise_data/2021/D20210811-T093822.raw

Error message: "ERROR: Something went wrong when reading the RAW file: /datain/D20210813-T093822.raw (<class 'ValueError'>) None"

The command was run on a windows machine like this: docker run -it --name test_pyechopreprocess -v d:\WRK:/datain -v d:\WRK\OUT:/dataout --security-opt label=disable --env OUTPUT_TYPE=zarr --env MAIN_FREQ=38000 --env MAX_RANGE_SRC=500 --env OUTPUT_NAME=S2020842 --env WRITE_PNG=0 crimac/preprocessor

nilsolav commented 3 years ago

The 200kHz on Statsraad Lehmkul is a single beam echosounder whereas the 38kHz is a split beam. The split beam echosounder provides the split beam angles in the data strcutures. This will be missing in the single beam. Could that be a possible explanation?

The 200KHz also has sequential pinging between FM and CS, and since the preprocessor do not handle the FM data yet, this may also cause the crash. A possible patch is to make a version where we can opt out on a transducer during the conversion, and rerun for all channels when we have code that can handle FM.

iambaim commented 3 years ago

It seems that the raw file contains a RAW4 datagram format that is not (yet?) defined anywhere (cf. https://www.simrad.online/ek80/interface_en/default.htm).

Pyecholab refuses to process this file because of the unknown format and that leads to the error in the preprocessor.

iambaim commented 3 years ago

Alright, did some trial and error runs with this.

Firstly, tried to ignore the RAW4 datagram format completely (https://github.com/iambaim/pyEcholab/commit/af2f278ac4a3ed4344173de19ef3cca9eb949d7f). However, this leads to the 200kHz CW channel contained only a single ping, while the the 38kHz CW and 200kHz FM channels had 7436 and 3600 pings, respectively.

Secondly, tried to treat RAW4 datagram as another RAW3 datagram with a different header (https://github.com/iambaim/pyEcholab/commit/98d4d9880db55260851dace48f8e83f2aa27785f). And now we have 7436, 3836, and 3600 pings for 38kHz CW, 200kHz CW, and 200kHz FM channels, respectively.

The latter fits with Nils Olav's explanation in https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-preprocessing/issues/20, where we should have the time-interleaved 200kHz CW and 200kHz FM channels data in a single raw file produced by the OneOcean ship.

I'll re-open the issue for the time being so that others can give comments.

rhtowler commented 3 years ago

Ibrahim's second fix is the way to go. Since the unpacking seems to work without issue, the RAW4 datagram must be identical through the count field. New fields, if any, would have to come after count. I've pulled those changes into my branch.

When we get more information on the format of the RAW4 datagram, we can extend the header definition if required and determine what changes would need to be made to the parser.

iambaim commented 3 years ago

Thanks for the comment, @rhtowler . I guess we are on the right path and I can close this issue for now.