Closed lostanlen closed 5 years ago
I think we should focus on the full-length names for these options (on the one-letter shortcuts too, but the full names are more important I think). I think the proposed values need some tweaking, see below:
-o output directory. Default: same as input.
Full length: --output-dir
, short: -o
-w export WAV audio clips of positive detections to a subfolder. Default: false. NB: this needs to be set to true in order to perform species classification downstream. Each WAV clip belongs to a subfolder named inputfile_suffix_clips and is named inputfile_timestamp_confidence.wav. Timestamps are rounded to the nearest millisecond. Confidences range between 00 and 99.
For full length I think it would make sense to use something like --extract-calls
or --extract-clips
, so I would go with -c
(for calls or clips) over -w
(especially since we might support multiple output formats in the future, not just WAV).
-t: threshold between 0 and 99. Default 50. With a value like that we should be able to guarantee at least 75% on recall and 65% precision in v1.0.0 (and possibly more later on). I will map these values to a nonlinear range in the event detection function domain so that values between 20 and 80 are reasonable. It would be best to warn users if they try to go above 90, because that would cause to have a precision under 30%. These values are indicative of the average performance of BirdVoxDetect on the leave-one-sensor-out test set of BirdVox-full-night. It would be good to put a PR curve in the README that gives recommended values of -t for the intended values of precision and recall.
--threshold
and -t
seem fine to me.
-h: export detection curve in HDF5 format.
-h
is reserved for help in CLIs. We could use --export-detection-curve
and -e
-x suffix. Like in Open-L3.
--suffix
and I would use -s
over -x
(in openl3 it's -x
only because -s
is already taken)
-t hop size in milliseconds. Default: 50. Values should be between 0 and 75 (otherwise we might miss portions of the audio input)
-t
is already taken up by the threshold ;) --hop-size
and -p
?
-d duration of the exported WAV audio clip in milliseconds. Default: 500.
--clip-duration
(assuming we go with --export-clips
) and -d
-q quiet. Default: false in command line but true in Python library
--quiet
and -q
sounds good. thanks for reviewing this.
Open L3 uses -t
for the hop size.
So perhaps the abbreviation -t
for --threshold
is worth reconsidering?
On the other hand, I would guess that users will change threshold more often than hop size ...
I would stick with -t
for threshold - it's the same first letter, as you note it's a more common use case, and anyway -t
for hop size doesn't make special sense and I see no strong reason to keep it.
I'd stick with -t
for threshold, and whatever else we can come up with for the hop size. We could for example, instead of --hop-size
, use --resolution
or --temporal-resolution
and -r
(for temporal Resolution of the analysis)
Maybe -r
for --frame-rate
? In which case this would be expressed in frames per second (default is 20)
-f
and --frame-rate
?
-r
and --rate
?
The words --resolution
sounds a little bit vague to me. Also the physical unit of a resolution
is less clear than the unit of a rate
sure, sgtm
OK, thanks. For future reference, I'm expressing -d
in seconds and defaulting it to 1 though.
-o
output directory. Default: same as input.-w
export WAV audio clips of positive detections to a subfolder. Default: false. NB: this needs to be set to true in order to perform species classification downstream. Each WAV clip belongs to a subfolder namedinputfile_suffix_clips
and is namedinputfile_timestamp_confidence.wav
. Timestamps are rounded to the nearest millisecond. Confidences range between 00 and 99.-t
: threshold between 0 and 99. Default 50. With a value like that we should be able to guarantee at least 75% on recall and 65% precision in v1.0.0 (and possibly more later on). I will map these values to a nonlinear range in the event detection function domain so that values between 20 and 80 are reasonable. It would be best to warn users if they try to go above 90, because that would cause to have a precision under 30%. These values are indicative of the average performance of BirdVoxDetect on the leave-one-sensor-out test set of BirdVox-full-night. It would be good to put a PR curve in the README that gives recommended values of -t for the intended values of precision and recall.-h
: export detection curve in HDF5 format.-x
suffix. Like in Open-L3.-t
hop size in milliseconds. Default: 50. Values should be between 0 and 75 (otherwise we might miss portions of the audio input)-d
duration of the exported WAV audio clip in milliseconds. Default: 500.-q
quiet. Default: false in command line but true in Python library