CRIMAC-WP4-Machine-learning / CRIMAC-preprocessing

Preprocessing acoustic data from .raw to a gridded format
GNU Lesser General Public License v3.0
7 stars 6 forks source link

Preprocesing the Sand Eel survey at Nautilus #8

Closed nilsolav closed 3 years ago

nilsolav commented 3 years ago

I try to run the preprocessor on the Azure files (copied to Nautilus):

docker run -it --name pyechopreprocess \ -v /scratch/disk2/AzureMirror/cruise_data/2016/S2016837_PEROS_3317/ACOUSTIC/EK60/EK60_RAWDATA/:/datain/ \ -v /scratch/disk2/AzureMirror/cruise_data/2016/S2016837_PEROS_3317/ACOUSTIC/ZARR/:/dataout/ \ -v /scratch/disk2/AzureMirror/cruise_data/2016/S2016837_PEROS_3317/ACOUSTIC/LSSS/WORK/:/workin \ --security-opt label=disable \ --env OUTPUT_TYPE=zarr \ --env MAIN_FREQ=38000 \ --env MAX_RANGE_SRC=500 \ --env OUTPUT_NAME=S2018823 \ --env WRITE_PNG=0 \ crimac/preprocessor

After processing a few files I get this error message:

Now processing file: /datain/2016837-D20160427-T013710.raw <class 'echolab2.instruments.EK60.EK60'> at 0x7fdb186bbfd0 EK60 object contains data from 6 channels: 1 :: GPT 18 kHz 00907205aeb7 1-1 ES18-11 :: power/angle (1091, 2645) 2 :: GPT 38 kHz 00907205aebc 6-1 ES38B :: power/angle (1091, 2645) 3 :: GPT 70 kHz 00907205aebe 2-1 ES70-7C :: power/angle (1091, 2645) 4 :: GPT 120 kHz 00907205c48f 3-1 ES120-7C :: power/angle (1091, 2645) 5 :: GPT 200 kHz 00907205aed2 4-1 ES200-7C :: power/angle (1091, 2645) 6 :: GPT 333 kHz 00907205fb9c 5-1 ES333-7C :: power/angle (1091, 10579) data start time: 2016-04-27T01:37:10.043 data end time: 2016-04-27T01:56:32.213 number of pings: 1091

Main frequency: 38000 Main channel: ['GPT 38 kHz 00907205aebc 6-1 ES38B'] Other channels: ['GPT 18 kHz 00907205aeb7 1-1 ES18-11', 'GPT 70 kHz 00907205aebe 2-1 ES70-7C', 'GPT 120 kHz 00907205c48f 3-1 ES120-7C', 'GPT 200 kHz 00907205aed2 4-1 ES200-7C', 'GPT 333 kHz 00907205fb9c 5-1 ES333-7C'] Channel with frequency 38000.0 range mismatch! Reference range size: 2644 != 2645 Channel with frequency 120000.0 range mismatch! Reference range size: 2644 != 2645 Channel with frequency 18000.0 range mismatch! Reference range size: 2644 != 2645 Channel with frequency 200000.0 range mismatch! Reference range size: 2644 != 2645 Channel with frequency 70000.0 range mismatch! Reference range size: 2644 != 2645 Channel with frequency 333000.0 range mismatch! Reference range size: 2644 != 10579 Reading:/workin/2016837-D20160427-T013710.work /usr/local/lib/python3.8/site-packages/annotationtools/readers/convert_to_annotations.py:1295: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if not intr.species_id == -1: Correcting time by 0 microseconds Traceback (most recent call last): File "/app/CRIMAC_preprocess.py", line 796, in status = raw_to_grid_multiple(raw_dir, File "/app/CRIMAC_preprocess.py", line 726, in raw_to_grid_multiple pq_writer = append_to_parquet(df, pq_filepath, pq_writer) File "/app/CRIMAC_preprocess.py", line 38, in append_to_parquet pq_obj.write_table(table=pa_tbl) File "/usr/local/lib/python3.8/site-packages/pyarrow/parquet.py", line 649, in write_table raise ValueError(msg) ValueError: Table schema does not match schema used to create file: table: ping_time: timestamp[ns] mask_depth_upper: double mask_depth_lower: double priority: int64 acoustic_category: string proportion: double object_id: string channel_id: string index_level_0: int64 -- schema metadata -- pandas: '{"index_columns": ["index_level_0"], "column_indexes": [{"na' + 1327 vs. file: ping_time: timestamp[ns] mask_depth_upper: double mask_depth_lower: double priority: int64 acoustic_category: null proportion: double object_id: null channel_id: null index_level_0: int64 -- schema metadata -- pandas: '{"index_columns": ["index_level_0"], "column_indexes": [{"na' + 1321

iambaim commented 3 years ago

It seems that we must force some types on the table, even though they are only filled with null. I'll investigate this.