alumae / kaldi-offline-transcriber

Offline transcription system for Estonian using Kaldi
Other
226 stars 57 forks source link

Multi-core, multi-threading - possible? #16

Open lkraav opened 6 years ago

lkraav commented 6 years ago

8-core machine could plow through diarization faster if parallelized - what's the biggest complexity stopping us from having it?

alumae commented 6 years ago

By far the most time-consuming part of speaker diarization is the last step -- NCLR clustering. I don't know if this algorithm is easily parallelizable or not.

However, current speaker recognition models are not highly sensitive to absolutely correct speaker diarization, so you could actually omit NCLR clustering (and gender identification), and use show.spl.seg instead of show.seg as the diarization result. This would save you about 80% of the time.

lkraav commented 6 years ago

@alumae thanks. Related to multi-threading ability, I'm also seeing crashes at a later stage:

[982571.380092] nnet3-latgen-fa[9579]: segfault at 0 ip 00007f3469bba1ab sp 00007f34367fbb60 error 6 in libopenblas_openmp_haswellp-r0.2.20.so[7f346996a000+3f5000]

Googling shows that openblas may have trouble with multithreading (at least w/ openmp enabled, which I have). Do you happen to have any experience with segfaults in the process? I'm testing running speech2text.sh with OMP_NUM_THREADS=1, but not very hopeful for it helping. Should probably test with a small sample audio file, too.

alumae commented 6 years ago

No, I haven't seen this. I usually use Intel's MKL, not OpenBLAS but of course it might not be possible for you.

Note that you can use parallel decoding (instead of multithreaded) if you set e.g. njobs=4 in Makefile.options, but I think then you could run into problems if your audio file has less than njobs speakers.

alumae commented 6 years ago

Actually, it's probably OK to use parallel decoding even with less than njobs speakers. But if you have less than njobs utterances (segments), it could fail.

lkraav commented 6 years ago

It seems like eliminating --nthreads and using OMP_NUM_THREADS=1 worked. I will now transcribe another file with only a single variable. Perhaps it was --nthreads 8 all along.