JarodMica / audiosplitter_whisper

MIT License
91 stars 35 forks source link

Fix illegal characters and first letter missing #5

Closed aotraz closed 1 year ago

aotraz commented 1 year ago

Using this for training my models, very nice work! I did run into an issue where with diarize enabled, speakers are sometimes named by what they say. This caused me an issue where it tried to make a file with a ? in it but then failed to find it since Windows didn't allow that. Also, when it would name them differently, the first character was missing since it was assumed that [ would always be the first character.

Edit should fix this from happening by simply removing any of the forbidden characters.

JarodMica commented 1 year ago

Awesome, didn't even consider that. Do you have an example of where it names the speaker based on what they say, I just wanna see an example as I haven't been making too many datasets recently.

I'll take a look at this and merge on my PC when I get the chance.

aotraz commented 1 year ago

Using this video as an example: https://www.youtube.com/watch?v=XTrnSJLXGBg

I get these folders: image

Here is also the srt if needed: https://pastebin.com/Jsb1BrD2