This repository contatin tools to read, convert and write annotatons from fisheries acoustics. A python representation of the annotations are defined as an annotation structure.
The main obejctive is to have test code that defines the annotations data structure in the ICES format, and code to read and write the format. It is our test implementation of the proposed standard, and the discussions at the ICES publication page should guide our development: https://github.com/ices-publications/SONAR-netCDF4
A python variable structure for annotations (based on dict?) that is a one to one mapping with the ICES Netcdf4 format is needed.
Based on discussions 06.01.2021 we should probably plan for two different representations. One that is aligned with the data grid, and one that defines the edges of the box that is independent of the data grid. For the first the time vector of the data is needed.
1:1 correspondence to the gridded data.
Similar to the ICES standard. Independent of the grid.
The test data are stored at two different places. It is recommended to first clone the annotation data.
The annotations are found at https://github.com/nilsolav/LSSS-label-versioning and can be clones using
git clone https://github.com/nilsolav/LSSS-label-versioning
The repository contatins three different branches.
The master branch contains the original work files (they also contain nc files for logistical resons; I need to commit them here in order to update the nc2work branch).
git checkout master
This branch contain the nc files with schools only, i.e. removing layers, erased regions and excluded pings.
git checkout schoolsonly
This branch contain the nc files generated from the original work files and work files generated from the nc files.
git checkout nc2work
The raw data is located under the OceanInsight azure storage. Contact nilsolav@hi.no to get access. The data is found under
/oceaninsightscience/File Shares/hidata/LSSS-label-versioning/
And the data should be placed in the folder hiearchy from the cloned repository.
The code and snippets below are the starting point for the development, in addition to the discussion at the ICES publication page.
This is Gavins suggested annotation format, see the CDL files inside for more details. Part of this process will be to refine this and interact with ICES to achieve that end:
https://github.com/nilsolav/EchosounderNetCDF
It is based on his review on how the different software implement the annotations:
This code has been written to use the U-net algorithm to make predictions and write the predictions to the NC format.
The fiste step generates a (temporary) pickle file from the U-net algorithm. This file could be a starting point for the definition of the python representation of the annotation format. This needs to be refactored to provide the revised annotations variable structure in python:
This function converts the temporary pickle file to nc. This needs to be refactored and use the annotations2nc instead:
https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-classifyers-unet/blob/master/createncfile.py
This matlab code reads the work files to matlab. This should be the starting point for the work2annotations function:
https://github.com/nilsolav/LSSSreader
We can post the NC file to LSSS via LSSS API (NB use the interpolate branch):
https://github.com/nilsolav/EchosounderNetCDF/tree/interpolate this part should perhaps be part of the report generation package, but can also stay here. In a sense, it is an alternative to the annotations_to_work function where we post the schools directly and let LSSS generate the work files.
This script first convert Marec LSSS' work files into a parquet file containing the annotations using the CRIMAC-annotationtools. These data are overlayed on the grid from sv.zarr generated by CRIMAC-preprocessing-korona, and a pixel wise annotation that matches the grid in the sv.zarr from CRIMAC-preprocessing-korona is generated.
In addition you have to set --env shipID=842 --env parselayers=0 . See details in the Example below
The output of this step is the parquet file: <OUTPUT_NAME>_labels.parquet
and the Zarr file: <OUTPUT_NAME>_labels.zarr
.
git clone https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-annotationtools.git
cd CRIMAC-annotationtools
docker build --build-arg=commit_sha=$(git rev-parse HEAD) --build-arg=version_number=$(git describe --tags) --no-cache --tag crimac-annotationtools .
This docker uses the exact same command and parameters as in step 3 of the old crimac preprocessor
docker run -it \
-v /data/cruise_data/2020/S2020842_PHELMERHANSSEN_1173/ACOUSTIC/EK60/EK60_RAWDATA:/datain \
-v /data/cruise_data/2020/S2020842_PHELMERHANSSEN_1173/ACOUSTIC/LSSS/WORK:/workin \
-v /localscratch/ibrahim-echo/out:/dataout \
--security-opt label=disable \
--env OUTPUT_TYPE=labels.zarr \
--env shipID=842
--env parselayers=0
--env OUTPUT_NAME=S2020842 \
crimac-annotationtools