HaroldMills / Vesper

Open source software for acoustic monitoring of nocturnal bird migration.
MIT License
55 stars 6 forks source link

Some automatically-created annotations do not include creating job or creating processor. #211

Closed HaroldMills closed 1 year ago

HaroldMills commented 1 year ago

I discovered this bug today when working on support for a new detector. The problem seems to have been introduced into the vesper.django.app.model_utils module in 423939dc8c443006d5d0d15065c2d9bfc63df755. The only Vesper releases to include the bug were 0.4.11 and 0.4.12, both of which were released on May 6, 2022. As of this writing, 0.4.12 is the current release.

HaroldMills commented 1 year ago

Fixed in f72d0e61d30ed5629803398bb638bbe13a31c33d.

HaroldMills commented 1 year ago

The problem was that in the model_utils module, the annotate_clip, unannotate_clip, tag_clip, and untag_clip functions always invoked the annotate_clips, unannotate_clips, tag_clips, and untag_clips functions, respectively, with creation_time, creating_user, creating_job, and creating_processor arguments of None, instead of passing along the arguments they had been called with.

Inspection of Vesper's source code, especially vesper/clip-album/clip-album.js, reveals that all manual annotation and tagging use the annotate-clip-batch, unannotate-clip-batch, tag-clip-batch, and untag-clip-batch Django views. These views have always attributed annotation and tagging operations to the correct user since they invoke the annotate_clips, unannotate_clips, tag_clips, and untag_clips functions of the model_utils module instead of the single-clip versions of those functions that were affected by the bug. Thus the only annotations and tags that are missing creator information are ones that were created automatically, e.g. by the Detect, Classify, Transfer clip classifications, Tag clips, and Untag clips commands.

In the database, the bug manifests as missing data in the creating_job_id and creating_processor_id columns of the vesper_string_annotation, vesper_string_annotation_edit, vesper_tag, and vesper_tag_edit tables. The rows affected by the bug are exactly those whose creating_job_id, creating_processor_id, and creating_user_id fields are all null.

I think it should be possible to write a script or Vesper command that can fill in the missing data most or all of the time by deducing which job and/or processor created a row from other available data. Key is the fact that every row, even the ones with missing data, includes a creation_time that makes it possible to find (from the vesper_job table) the jobs that were running when the row was created, and hence which job/processor combinations might have created it. In many cases there will be only one such combination, in which case we can fill in the missing information with that. Even when there is more than one such combination, we might still be able to deduce which was the creator. For example, the BirdVoxDetect detector creates both clips and annotations when it runs. If two different versions of BirdVoxDetect were running when an annotation was created, we can figure out which version created a particular annotation by looking in the vesper_clip table for the processor that created the annotation's clip.