Open MariusAlbrecht opened 10 months ago
In simple terms, this situation occurred because the volume of the video was too low. In the code below, segments with a volume less than -15dB for more than 30 seconds are considered silent sections that can be skipped; while the volume of the lecture in this class is around -25dB.
// worker/worker/silence.go
cmd := exec.Command("nice", "ffmpeg", "-nostats", "-i", s.Input, "-af", "silencedetect=n=-15dB:d=30", "-f", "null", "-")
output, err := cmd.CombinedOutput()
...
Do you think we can find a sensible value that doesn't consider background noise in some lectures as speech? If not we might need to do something fancy like determine average loudness of different parts of the lecture to calculate a threshold for silence.
I randomly selected four replay videos, including the one mentioned in the issue, and analyzed their volumes. The graph below shows the volume changes over time, where the red dotted line indicates the -15dB threshold defined in the current code.
It can be observed that there is a distinct difference in volume between the teaching and resting periods in all four cases. However, the volume range varies across each video, and a threshold of -15dB is not an appropriate value to determine whether to skip for the 1st and 3rd videos.
Do you think we can find a sensible value that doesn't consider background noise in some lectures as speech? If not we might need to do something fancy like determine average loudness of different parts of the lecture to calculate a threshold for silence.
Based on these four videos, -40 dB might be a more appropriate choice as an absolute threshold. I think adopting a relative threshold is also feasible, such as the midpoint value (in decibel) between the maximum and minimum volumes. However it's important to note that the relationship between decibel values and perceived loudness is not linear. Perhaps more videos need to be analyzed to determine an appropriate absolute threshold or calculation method for a relative threshold.
Another idea: is it necessary and reasonable to normalize the volume of all recordings?
Third idea: Simply pick a good absolute threshold (-15dB for instance), if it does't work for some videos, just let it be, because manually skipping breaks in lectures is pretty easy; moreover, in most cases, we know the pattern about when the break takes place in a specific course.
Wow, this is some great research. Thanks a ton @meandaD
Another idea: is it necessary and reasonable to normalize the volume of all recordings?
This could be investigated also, but might be out of scope for this issue. If you feel this is reasonable please open another issue :)
Describe the bug Entire stream recognized as pause and "Skip pause" button skips over the entire stream. Does include perfectly audible lecture
To Reproduce Steps to reproduce the behaviour:
Desktop (please complete the following information):