lucasgautheron commented 3 years ago

Description

Progress

[ ] tests
[x] documentation
[x] implementation
- [x] EnergyDetectionSampler class
- [x] energy computation ~~with an optional lowpass filter. if the lowpass filter does not have any significant impact, we might as well just get rid of it, then we need no fft...~~ opted for an optional band-pass filter.
- [x] Handling saturation : this is especially important since targeting higher energies will favor saturated regions. We could discard regions where > 1 % of saturating signal. Though, what should we do in case there are > 1 channels ? discard the channels, and the whole segment if all channels are saturating ? or should we discard a segment when at least one of the channel is saturating ? Marvin: let's keep it in mind, BUT no treatment for now
- [ ] parallelise energy computation. the difficulty is to keep memory usage under control
- [x] filtering strategy: threshold ? It cannot be hardcoded I think... Or sorting by highest energy and taking the N first windows ? Prolly too biased... For now I suggest to sample from the x percents of windows with the highest energy. What do you think ? Discussion here.
- [x] multi-channel support: we need to decide how to do that in the first place !. I suggest we sum the energies across channels, and apply the threshold based on the quantile of the sum of energies. It does not make sense to sum amplitudes, but it does make much more sense to sum average energies over some interval of time. The only issue that I can see, is that this has a bias towards overlapping sources... but, so does any threshold based on the energy tbh ! Discussion here.
- [ ] recordings filter : sample from a restricted list of recordings
- [ ] combining several recorders : Let be x_i and y_i the fractions of windows having a higher energy than the window i, for the recorder x and the recorder y respectively. There are several possibilities:
  1. Check the agreement between devices ? (i.e. veto windows that do not pass the threshold in one of the devices). This leads to the condition (1-x_i > 1-q) and (1-y_i > 1-q).
  2. Find a way to combine the energy quantiles of each (window, recorder). These can be computed independently w/o any difficulty for the recorders x and y. By definition, P(x_i < x) = x and P(y_i < y) = y (they follow U(0,1)). It follows that P(x_i > x) = 1-x and P(y_i > y) = 1-y. Obviously, x_i and y_i are expected to be strongly correlated... But if we assume they are not, we might be able to combine these p-values in some way idk...
  3. We can also average energies or something, but that sounds really wrong... unless we do some normalization first, which is cumbersome...
    - Imo the agreement based approach seems more appropriate.

Associated issues

109

MarvinLvn commented 3 years ago

About the multi-channel discussion :

your approach seems fine to me.
I think it'd be nice to be able to run the pipeline only on one of the N (N=2 or N=4) channel. In some cases (it may be the case with the BabyLogger), we can put some prior on which mic' gets the most speech.

MarvinLvn commented 3 years ago

combining several recorders : Let be x_i and y_i the fractions of windows having a higher energy than the window i, for the recorder x and the recorder y respectively. There are several possibilities: 1) Check the agreement between devices ? (i.e. veto windows that do not pass the threshold in one of the devices). This leads to the condition (1-x_i > 1-q) and (1-y_i > 1-q). 2) Find a way to combine the energy quantiles of each (window, recorder). These can be computed independently w/o any difficulty for the recorders x and y. By definition, P(x_i < x) = x and P(y_i < y) = y (they follow U(0,1)). It follows that P(x_i > x) = 1-x and P(y_i > y) = 1-y. Obviously, x_i and y_i are expected to be strongly correlated... But if we assume they are not, we might be able to combine these p-values in some way idk... 3) We can also average energies or something, but that sounds really wrong... unless we do some normalization first, which is cumbersome...Imo the agreement based approach seems more appropriate.

After your first quantile threshold, 3 possibilities : 1) No agreement at all amongst the 2 recorders : dead-end 2) You get agreement for some windows, but not enough of them 3) Full agreement

The situation where you'll end up will mainly depend on the quantity of data you have, the threshold you apply. In situation number 1) and 2), you could always decide to lower the quantile threshold until you get enough windows for recorder a) and recorder b) : not sure this is the ideal solution.

Solution that seems the easiest to implement is the one that consists in averaging energies. You can do something like : 1) Compute E_a, E_b the list of energies for recorder a) and recorder b) 2) Normalize E_a by : (E_a - mean(E_a)) / std(E_a), same for E_b 3) Go through your single-channel pipeline : apply quantile threshold, etc ...

This is just one more pre-processing step as compared to solution 1) and 2). Agreement-based solutions seem scary to me as they are too data-dependent, and you'll always find situations where you won't have enough agreements between the 2 recorders. Hence, why I'll "combine the 2 recorders" as soon as possible.

lucasgautheron commented 3 years ago

About the multi-channel discussion :

your approach seems fine to me.

I think it'd be nice to be able to run the pipeline only on one of the N (N=2 or N=4) channel. In some cases (it may be the case with the BabyLogger), we can put some prior on which mic' gets the most speech.

I suggest two different options:

We provide an additional 'channel' option. If specified, only this channel is used to compute the energy
We provide a 'channel_weights' option. It must be a list of length equal to the amount channels. Each channel energy is then weighted by some coefficient. One can turn off channels by setting their coefficient to 0.

I prefer the latter, though it is less efficient when the goal is just to turn off channels.

lucasgautheron commented 3 years ago

combining several recorders : Let be x_i and y_i the fractions of windows having a higher energy than the window i, for the recorder x and the recorder y respectively. There are several possibilities:

Check the agreement between devices ? (i.e. veto windows that do not pass the threshold in one of the devices). This leads to the condition (1-x_i > 1-q) and (1-y_i > 1-q).

Find a way to combine the energy quantiles of each (window, recorder). These can be computed independently w/o any difficulty for the recorders x and y. By definition, P(x_i < x) = x and P(y_i < y) = y (they follow U(0,1)). It follows that P(x_i > x) = 1-x and P(y_i > y) = 1-y. Obviously, x_i and y_i are expected to be strongly correlated... But if we assume they are not, we might be able to combine these p-values in some way idk...

We can also average energies or something, but that sounds really wrong... unless we do some normalization first, which is cumbersome...Imo the agreement based approach seems more appropriate.

After your first quantile threshold, 3 possibilities :

No agreement at all amongst the 2 recorders : dead-end

You get agreement for some windows, but not enough of them

Full agreement

The situation where you'll end up will mainly depend on the quantity of data you have, the threshold you apply. In situation number 1) and 2), you could always decide to lower the quantile threshold until you get enough windows for recorder a) and recorder b) : not sure this is the ideal solution.

Solution that seems the easiest to implement is the one that consists in averaging energies. You can do something like :

Compute E_a, E_b the list of energies for recorder a) and recorder b)

Normalize E_a by : (E_a - mean(E_a)) / std(E_a), same for E_b

Go through your single-channel pipeline : apply quantile threshold, etc ...

This is just one more pre-processing step as compared to solution 1) and 2). Agreement-based solutions seem scary to me as they are too data-dependent, and you'll always find situations where you won't have enough agreements between the 2 recorders. Hence, why I'll "combine the 2 recorders" as soon as possible.

I see.. Though, I am more optimistic than you. I would expect high correlations between the two recorders. Besides the distribution of energies is very flat. Here is the distribution for 600 30s windows drawn from a 20h USB recording. The energy is spanned across 5 orders of magnitude.

Screenshot 2021-02-17 at 11 58 17

In the end, it is not very difficult to switch from one way or another once we have the core pipeline ready. So from now on, i'll implement the method you suggest, but the recorder-agreement thing will lie outside of the package, because that seems way too specific - though it will use the package to compute the energies.

There is just one thing I need to think about : whether this z-score normalization really fits. I might be overthinking this...

LAAC-LSCP / ChildProject

Energy detection based sampler #130

Description

Progress

Associated issues

109