Bklieger / ScribeWizard

ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
https://wizard.benjamin.sh
MIT License
396 stars 88 forks source link

Audio data is not removed after download in edge case #4

Open Bklieger opened 2 months ago

Bklieger commented 2 months ago

If the user closes the streamlit app window after the YouTube video is downloaded but before the Whisper transcription is complete, the audio data will not be deleted from the download file. This produces a leak of unnecessarily increasing data storage requirements over time. To patch, a PR should be made with a fix that will delete any downloaded files after a user closes the window.

In main.py:

Line 398:

""" Downloads audio files """

if input_method == "YouTube link":
                display_status("Downloading audio from YouTube link ....")
                audio_file_path = download_video_audio(youtube_link, display_download_status)
[...]

Line 417:

""" Transcribes audio using Whisper which may take 3-10 seconds on average, 
during which the user could close the window and stop the program, making the 
removal function below not reached to execute."""

display_status("Transcribing audio in background....")
transcription_text = transcribe_audio(audio_file)
display_statistics()

Line 421:

""" Removes downloaded audio files from download folder """

delete_download(audio_file_path)
Bklieger commented 2 months ago

This bug has been partially mitigated (https://github.com/Bklieger/groqnotes/commit/78feeb745c61cd4d818527cfcaf8c45c53e14b14) by deleting the audio files after reading the audio data. This means if the user closes the window after the download is fully complete and processed, the audio files will not be in the downloads folder.

However, this is an incomplete fix as there still remains the case in which a user closes the app window before the full download and processing is complete, leaving data in the downloads folder which will not be deleted automatically.

To patch, a PR should still be made with a fix that will delete any downloaded files after a user closes the window.