BirdVox / birdvoxdetect

A pre-trained deep learning system for detecting bird flight calls in continuous recordings
MIT License
78 stars 15 forks source link

Keyword arguments and their defaults #1

Closed lostanlen closed 5 years ago

lostanlen commented 5 years ago
justinsalamon commented 5 years ago

I think we should focus on the full-length names for these options (on the one-letter shortcuts too, but the full names are more important I think). I think the proposed values need some tweaking, see below:

-o output directory. Default: same as input.

Full length: --output-dir, short: -o

-w export WAV audio clips of positive detections to a subfolder. Default: false. NB: this needs to be set to true in order to perform species classification downstream. Each WAV clip belongs to a subfolder named inputfile_suffix_clips and is named inputfile_timestamp_confidence.wav. Timestamps are rounded to the nearest millisecond. Confidences range between 00 and 99.

For full length I think it would make sense to use something like --extract-calls or --extract-clips, so I would go with -c (for calls or clips) over -w (especially since we might support multiple output formats in the future, not just WAV).

-t: threshold between 0 and 99. Default 50. With a value like that we should be able to guarantee at least 75% on recall and 65% precision in v1.0.0 (and possibly more later on). I will map these values to a nonlinear range in the event detection function domain so that values between 20 and 80 are reasonable. It would be best to warn users if they try to go above 90, because that would cause to have a precision under 30%. These values are indicative of the average performance of BirdVoxDetect on the leave-one-sensor-out test set of BirdVox-full-night. It would be good to put a PR curve in the README that gives recommended values of -t for the intended values of precision and recall.

--threshold and -t seem fine to me.

-h: export detection curve in HDF5 format.

-h is reserved for help in CLIs. We could use --export-detection-curve and -e

-x suffix. Like in Open-L3.

--suffix and I would use -s over -x (in openl3 it's -x only because -s is already taken)

-t hop size in milliseconds. Default: 50. Values should be between 0 and 75 (otherwise we might miss portions of the audio input)

-t is already taken up by the threshold ;) --hop-size and -p ?

-d duration of the exported WAV audio clip in milliseconds. Default: 500.

--clip-duration (assuming we go with --export-clips) and -d

-q quiet. Default: false in command line but true in Python library

--quiet and -q

lostanlen commented 5 years ago

sounds good. thanks for reviewing this. Open L3 uses -t for the hop size. So perhaps the abbreviation -t for --threshold is worth reconsidering? On the other hand, I would guess that users will change threshold more often than hop size ...

justinsalamon commented 5 years ago

I would stick with -t for threshold - it's the same first letter, as you note it's a more common use case, and anyway -t for hop size doesn't make special sense and I see no strong reason to keep it.

I'd stick with -t for threshold, and whatever else we can come up with for the hop size. We could for example, instead of --hop-size, use --resolution or --temporal-resolution and -r (for temporal Resolution of the analysis)

lostanlen commented 5 years ago

Maybe -r for --frame-rate? In which case this would be expressed in frames per second (default is 20) -f and --frame-rate? -r and --rate?

The words --resolution sounds a little bit vague to me. Also the physical unit of a resolution is less clear than the unit of a rate

justinsalamon commented 5 years ago

sure, sgtm

lostanlen commented 5 years ago

OK, thanks. For future reference, I'm expressing -d in seconds and defaulting it to 1 though.