Shadows97 / PySubtitle

It is a project for generate subtitle file for video
MIT License
7 stars 2 forks source link

Issue with Audio Segmentation When Video Contains Long Periods of Silence #4

Open Shadows97 opened 5 months ago

Shadows97 commented 5 months ago

There is an issue with the current audio segmentation approach in PySubtitle when processing videos that contain long periods of silence. The split_on_silence function from pydub may not handle these long silent segments effectively, leading to incomplete or inaccurate transcription and subtitle generation.

Steps to Reproduce:

  1. Use a video file that contains long periods of silence (e.g., 5-10 seconds or more).
  2. Run the audio_to_text function to convert the audio to text.
  3. Observe the generated VTT file and note that the transcription may stop prematurely or miss segments of the video.

Expected Behavior:

The audio segmentation should handle long periods of silence more effectively, ensuring that the entire video is processed and transcribed accurately.

Actual Behavior: The transcription process may stop prematurely or miss segments of the video when long periods of silence are encountered.

Possible Solution:

This issue affects the accuracy and completeness of the generated subtitles, especially for videos with significant silent segments. Improving the segmentation approach will enhance the overall reliability of PySubtitle.

hcm444 commented 5 months ago

We can implement a custom segmentation method that detects and splits the audio based on a threshold duration of silence rather than relying solely on split_on_silence from pydub. So something like

def custom_split_on_silence(sound, min_silence_len=500, silence_thresh=-40)

Shadows97 commented 5 months ago

possible