TUM-Dev / gocast

TUMs lecture streaming service.
https://live.rbg.tum.de/
MIT License
180 stars 42 forks source link

Entire stream recognized as pause #1186

Open MariusAlbrecht opened 10 months ago

MariusAlbrecht commented 10 months ago

Describe the bug Entire stream recognized as pause and "Skip pause" button skips over the entire stream. Does include perfectly audible lecture

To Reproduce Steps to reproduce the behaviour:

  1. Go to https://live.rbg.tum.de/w/sturepfuprover/35614
  2. Skip pause
  3. be at end of stream

Desktop (please complete the following information):

YiranDuan721 commented 9 months ago

In simple terms, this situation occurred because the volume of the video was too low. In the code below, segments with a volume less than -15dB for more than 30 seconds are considered silent sections that can be skipped; while the volume of the lecture in this class is around -25dB.

// worker/worker/silence.go
cmd := exec.Command("nice", "ffmpeg", "-nostats", "-i", s.Input, "-af", "silencedetect=n=-15dB:d=30", "-f", "null", "-")
output, err := cmd.CombinedOutput()
...
joschahenningsen commented 9 months ago

Do you think we can find a sensible value that doesn't consider background noise in some lectures as speech? If not we might need to do something fancy like determine average loudness of different parts of the lecture to calculate a threshold for silence.

YiranDuan721 commented 9 months ago

I randomly selected four replay videos, including the one mentioned in the issue, and analyzed their volumes. The graph below shows the volume changes over time, where the red dotted line indicates the -15dB threshold defined in the current code.

volume

Details about this figure The volume scatter points in the graph represent the maximum volume of each segment in the HLS, obtained by the ffmpeg volumedetect filter. The segment duration of the four videos shown ranges from 5 to 8 seconds.

It can be observed that there is a distinct difference in volume between the teaching and resting periods in all four cases. However, the volume range varies across each video, and a threshold of -15dB is not an appropriate value to determine whether to skip for the 1st and 3rd videos.

YiranDuan721 commented 9 months ago

Do you think we can find a sensible value that doesn't consider background noise in some lectures as speech? If not we might need to do something fancy like determine average loudness of different parts of the lecture to calculate a threshold for silence.

Based on these four videos, -40 dB might be a more appropriate choice as an absolute threshold. I think adopting a relative threshold is also feasible, such as the midpoint value (in decibel) between the maximum and minimum volumes. However it's important to note that the relationship between decibel values and perceived loudness is not linear. Perhaps more videos need to be analyzed to determine an appropriate absolute threshold or calculation method for a relative threshold.

YiranDuan721 commented 9 months ago

Another idea: is it necessary and reasonable to normalize the volume of all recordings?

Third idea: Simply pick a good absolute threshold (-15dB for instance), if it does't work for some videos, just let it be, because manually skipping breaks in lectures is pretty easy; moreover, in most cases, we know the pattern about when the break takes place in a specific course.

joschahenningsen commented 9 months ago

Wow, this is some great research. Thanks a ton @meandaD

Another idea: is it necessary and reasonable to normalize the volume of all recordings?

This could be investigated also, but might be out of scope for this issue. If you feel this is reasonable please open another issue :)