JuergenFleiss / aTrain

A GUI tool for offline transcription of speech recordings, including speaker diarization, utilizing state-of-the-art machine learning models.
Other
300 stars 20 forks source link

aTrain crashes when filenames contain special characters #27

Open JuergenFleiss opened 2 months ago

JuergenFleiss commented 2 months ago

Ok, this is a current bug. Workaround is to of course not use special characters like # in your filenames.

Thanks to @wenyuan-wu and @hirowa for figuring this out and confirming it in https://github.com/JuergenFleiss/aTrain/issues/20.

Will try to fix this in a future version, happy for ideas or pull requests; in fact, maybe the file sanitizing part, using secure_filename, of https://github.com/JuergenFleiss/aTrain/pull/21 by @SjDayg is already the way to go. Couldn't quickly find the documentation for all that this is doing.

SjDayg commented 2 months ago

Hi, yes, https://github.com/JuergenFleiss/aTrain/pull/21/commits/86911b6ddafbe4e64734e85d777e5ea453be4cb8 should fix it.

I just tested the fix, just to be sure and it worked on Linux Debian 11. Uploading the files a#b.mp4 and b#c.mp4 produced this:

root@0f968cdb036e:~/Documents/aTrain/transcriptions# find -type f
./2024-07-09 19-35-13 ab.mp4/metadata.txt
./2024-07-09 19-35-13 ab.mp4/transcription.json
./2024-07-09 19-35-13 ab.mp4/transcription.txt
./2024-07-09 19-35-13 ab.mp4/transcription_timestamps.txt
./2024-07-09 19-35-13 ab.mp4/transcription_maxqda.txt
./2024-07-09 19-35-13 ab.mp4/transcription.srt
./2024-07-09 19-39-53 bc.mp4/metadata.txt
./2024-07-09 19-39-53 bc.mp4/transcription.json
./2024-07-09 19-39-53 bc.mp4/transcription.txt
./2024-07-09 19-39-53 bc.mp4/transcription_timestamps.txt
./2024-07-09 19-39-53 bc.mp4/transcription_maxqda.txt
./2024-07-09 19-39-53 bc.mp4/transcription.srt

The fix btw also allows for .opus files which is a normal audio format that is also supported by ffmpeg (also tested by me sometime in the past).

JuergenFleiss commented 2 months ago

Great! Could you make a pull request with that for the new backend? https://github.com/JuergenFleiss/atrain_core

Then your fix and extension will be included in the next version and I think you would otherwise not get credit.

SjDayg commented 2 months ago

Hi, the issue isn't currently there with the cli. So no fix required here.