Non-determinism in canetoad detector

atruskie commented 8 years ago

Reported by @proe

I had some weird behaviour on the Pi which I also reproduced also on Ubuntu. Basically for different runs of the toads detector and same input files, I get different numbers of events detected:

pi@raspberrypi:~/tt$ mono ~/Release/AnalysisPrograms.exe canetoad /source:2015-06-03-004248.wav /config:Towsey.Canetoad.yml /output:output
QUT Bioacoustic Analysis Program - version 16.06.3430.0 (RELEASE build, 23/06/2016 01:29)
Git branch-version: master-16a11bad5e3c2423bb92386ec83773c700eb4be0
Copyright QUT 2016
1 events found
pi@raspberrypi:~/tt$ rm -rf output/
pi@raspberrypi:~/tt$ mono ~/Release/AnalysisPrograms.exe canetoad /source:2015-06-03-004248.wav /config:Towsey.Canetoad.yml /output:output
QUT Bioacoustic Analysis Program - version 16.06.3430.0 (RELEASE build, 23/06/2016 01:29)
Git branch-version: master-16a11bad5e3c2423bb92386ec83773c700eb4be0
Copyright QUT 2016
2 events found

(The config files are the shipped ones.) Any ideas?

I repro'd this on mono on our production servers. I got a "2 event" incidence rate of 4/74 (5.4%).

I've also repro'd this in our latest version, on Windows. Indicence rate of 11/158 (7%).

I've done a cursory look into the canetoad code - I can't see any obvious sources of randomness. Michael may know better than me.

I guess any pseudo random number generation could do this. In which case we probably need to control it's seeding via a configuration option. Otherwise I'm out of ideas and it starts looking like an initialised variable.

The only sources of randomness in .NET I can think of are:

unsafe code
Math.Random
Cryptographic randomness
external factors
external libraries

We don't use 1, or 3.

I debugged the code tonight a bit. I still couldn't see a source for randomness. I'm starting to suspect audio file conversion - the files do get resampled.

2015-06-03-004248.zip

atruskie commented 8 years ago

From @towsey:

I have found that I get different indices results from one analysis to the next for some indices but not others. Usually the differences are in the lesser significant figures but I can imagine that it would make a difference to acoustic event detection if an event were on the edge of a minute segment. I am assuming the differences arise due to differences in cutting of files into one-minute blocks. But perhaps that was back in the mp3 days. I have not investigated this issue.

atruskie commented 8 years ago

With the changes from the Bird50 branch, this bug no longer occurs. After several hours of trying to replicate the results and tweak settings, we never managed to track down the source of the nondeterminism.

Further investigation determined that our SoX transforms are NOT deterministic; the SHA256 of files output from the same operation differs. However, their sample rate, duration, and bit rate are exactly the same. The WAVE header is different and the body may be.

@towsey and I have made sure the detector works for the submitted file (after conversion to .wav) and consistently returns one event. However, one of the changes made recently means that the EventThreshold value in the config file now takes effect whereas it was previously (mistakenly) ignored. The default value for the event threshold is 0.4 which is high above the score returned for the event in the file submitted with this issue (which had a score of 0.04 - a very weak event). If you consider the event a true positive, you'll need to change your event threshold to 0.03 at least, when you run recent versions of AP.exe.

I think that's all can reasonably do for now, unless you want further investigation @proe?

atruskie commented 8 years ago

Paul seems happy with result. Closing for now.

atruskie commented 7 years ago

CaneToad_Gympie.zip

Adding email conversation for the record:

I believe we have located the source of indeterminacy that intermittently bothered us previously. If a recording must be resampled (for example from 44100 to 22050) prior to analysis, it returns a different array of signal samples each time. Of course they do not differ by much but just enough, that a recognizer will return events where the signal strength is sitting at or just near the threshold. However if no resampling is required, then the exact same values are returned on repeated runs (determinacy prevails). Since the Groote recordings are all at 22050 (at least I assume they are) the indeterminacy problem does not affect those recordings. Anthony is looking at ways to fix this resampling problem.

Ok its looks like there is an option to fix this -R on sox - Paul

−R Run in ‘repeatable’ mode. When this option is given, where applicable, SoX will embed a fixed time-stamp in the output file (e.g. AIFF) and will ‘seed’ pseudo random number generators (e.g.dither) with a fixed number, thus ensuring that successive SoX invocations with the same inputs and the same parameters yield the same output.

https://sourceforge.net/p/sox/bugs/258/

towsey commented 7 years ago

Indeterminacy was first picked up with the canetoad recognizer when it returned different numbers of canetoad events and of different duration for multiple runs on the same recording file. Since events are derived from an array of score values (array of double), further examination revealed that a different score array was returned with each analysis of the same recording. Following this trail, it was found that each analysis produced a slightly different decibel spectrogram. The decibel values tended to hover around an "average" value. For example the first value returned for the spectrogram (frame 1, freq bin 1) hovered around -21dB but ranged between -19dB and -22dB. Finally, it was found that multiple reads of the .wav file produced a different array of recording samples. The differences from run to run were small but sufficient to be carried through to slight differences in the score array, which translated into different events, especially when score values were near the threshold.

Finally, I found that the indeterminacy could be traced to recordings that needed to be resampled to 22050 Hz prior to analysis. Recordings already sampled at 22050Hz return identical analysis results on repeated runs of the same analysis. It is most likely that the indeterminacy is due to the filtering by SOX prior to down-sampling. SOX has different options for filtering as described by Paul in the previous comment.

QutEcoacoustics / audio-analysis

Non-determinism in canetoad detector #91