LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.
https://childproject.readthedocs.io
MIT License
13 stars 5 forks source link

Vetting and sampling #147

Open lucasgautheron opened 3 years ago

lucasgautheron commented 3 years ago

Is your feature request related to a problem? Please describe.

We should allow the combination of vetting together with our samplers. E.g., every sampler should ideally be able to sample from a subset of each recording.

Describe the solution you'd like

Thinking about it ! The best solution will be the solution that doesn't imply to reimplement this every time we create a new sampler, but it might be tricky - e.g. excluding segments from vocalization-based samplers is easy, but it is not that easy with fixed window-length samplers... If the windows are long, there might be lots of intersection with vetoed segments...

At first sight, there are two classes of approaches:

  1. upstream vetting, i.e., exclude the segments, then find some way to sample from the remaining of the recording. However, there is no obvious way for me to reconcile this method with window-based samplers, e.g. those that arbitrarily cut the recordings into fixed length windows. The obvious advantage, however, is to maximize the coverage.
  2. downstream vetting, i.e., remove the samples or parts of the samples that contain vetted audio, which seems easier to implement regardless of the method.

The downstream way does not feel right... Besides, there is not much added-value to have that in the package, as the users are free to do any filtering afterwards, e.g. they could generate samples, mute the vetted regions, then proceed. I think that downstream vetting should not occur at the sampler level, but rather at the second step, when the audio is prepared for zooniverse or for the annotators.

The upstream way is more complicated, but only because it is more powerful : the goal is to maximise the amount of audio that can be sampled despite vetting...

I can also see problems arise when contiguousness of the samples is important...

lucasgautheron commented 3 years ago

@alecristia do you have any example of output of a vetting process ?