Open mxzgithub opened 1 year ago
Because of my specific requirements, I need speaker identification, do you need that?
This is the main reason i use your transcriber. Thank you for making the pipeline public. I tested various files with 4 to 7 speakers. In one sample i get about 1000 numbered speaker lines and about 200 with question mark. About 15 to 20% of the speaker lines will not be numbered. Do you have any tips how this can be improved?
Further question: Do you change the generic speaker numbers to names later in the process? I considered writing a module to rename the numbered speakers in a file and then rename all the "speaker?" one by one. There are just to many of them to go through at the moment.
Okay the main reason why you get <Speaker?> is because of overlapping conversation, the program is not able to assign one speaker to that chunk. Second reason is because of if the conversation follows one another too closely without enough silence interval then it could not be broken into two conversations then it will become reason number one.
My specific requirement is to document even filler words, so I cannot ignore of non speech sounds, perhaps you can put another layer of SAD (Speech Activity Detection for further filtering).
For my use case I do not have requirements to change the names to actual names, but you can definitely write a module to change the speakers to proper names.
st = get_speech_timestamps(wav, smodel,
threshold=0.65, #0.5,
sampling_rate=16000,
min_speech_duration_ms=5, #250,
min_silence_duration_ms=100, #100,
window_size_samples= 1536, #this is fixed
speech_pad_ms=10, #30,
return_seconds= False,
visualize_probs=False
)
You can try to tune these parameters. you can lower the threshold, min_speech_duration_ms and min_silence_duration.
You can use the software subtitleedit to further process the final file
I get a lot of "Speaker?" in the final file and i do not know how to improve this. Maybe you can give a few tips how to work with the pipeline.