aalto-speech / speaker-diarization

Speaker diarization scripts, based on AaltoASR
190 stars 37 forks source link

UnboundLocalError: local variable 'feas' referenced before assignment #18

Open imKarthikeyanK opened 5 years ago

imKarthikeyanK commented 5 years ago

While trying to execute the below command ..

python spk-diarization2.py /mnt/c/users/karthikeyan/Downloads/proper.wav

am getting,

Reading file: /mnt/c/users/karthikeyan/Downloads/proper.wav Writing output to: stdout Using feacat from: /home/userk/speaker-diarization/feacat Writing temporal files in: /tmp Writing lna files in: /home/userk/speaker-diarization/lna Writing exp files in: /home/userk/speaker-diarization/exp Writing features in: /home/userk/speaker-diarization/fea Performing exp generation and feacat concurrently Traceback (most recent call last): File "./generate_exp.py", line 37, in from docopt import docopt ImportError: No module named docopt Calling voice-detection2.py Reading recipe from: /tmp/initrypiaG.recipe Reading .exp files from: /home/userk/speaker-diarization/exp Writing output to: /tmp/vadHJVgzE.recipe Sample rate set to: 125 Minimum speech turn duration: 0.5 seconds Minimum nonspeech between-turns duration: 1.5 seconds Segment before expansion set to: 0.0 seconds Segment end expansion set to: 0.0 seconds Error, /home/userk/speaker-diarization/exp/proper.exp does not exist Waiting for feacat to end. Calling spk-change-detection.py Reading recipe from: /tmp/vadHJVgzE.recipe Reading feature files from: /home/userk/speaker-diarization/fea Feature files extension: .fea Writing output to: /tmp/spkcM3EdlF.recipe Conversion rate set to frame rate: 125.0 Using a growing window Deltaws set to: 0.096 seconds Using BIC as distance measure, lambda = 1.0 Window size set to: 1.0 seconds Window step set to: 3.0 seconds Threshold distance: 0.0 Useful metrics for determining the right threshold:

Maximum between windows distance: 0 Total windows: 0 Total segments: 0 Maximum between detected segments distance: 0 Total detected speaker changes: 0 Calling spk-clustering.py ('===', '/tmp/spkcM3EdlF.recipe') Reading recipe from: /tmp/spkcM3EdlF.recipe Reading feature files from: /home/userk/speaker-diarization/fea Feature files extension: .fea Writing output to: stdout Conversion rate set to frame rate: 125.0 Using hierarchical clustering Using BIC as distance measure, lambda = 1.3 Threshold distance: 0.0 Maximum speakers: 0 ('::::::::::::::::::::::::::::::::::', 0) Initial cluster with: 0 speakers Traceback (most recent call last): File "./spk-clustering.py", line 432, in process_recipe(parsed_recipe, speakers, outf) File "./spk-clustering.py", line 293, in process_recipe spk_cluster_m(feas[1], recipe, speakers, outf, dist, segf) UnboundLocalError: local variable 'feas' referenced before assignment

I tried looking into spk-clustering.py . the len(receipe) and feas values are 0.... thank you,

antoniomo commented 5 years ago

Do you have the docopt dependency, and are you using python2?

imKarthikeyanK commented 5 years ago

yeah.. Thankyou. I was missing docopt dependency. Now Im getting this result...

userk@PSSHSRDT034:~/speaker-diarization$ python spk-diarization2.py /mnt/c/users/karthikeyan/Downloads/proper.wav Reading file: /mnt/c/users/karthikeyan/Downloads/proper.wav Writing output to: stdout Using feacat from: /home/userk/speaker-diarization/feacat Writing temporal files in: /tmp Writing lna files in: /home/userk/speaker-diarization/lna Writing exp files in: /home/userk/speaker-diarization/exp Writing features in: /home/userk/speaker-diarization/fea Performing exp generation and feacat concurrently tokenpass: ./VAD/tokenpass/test_token_pass Reading recipe: /tmp/initzDxEk1.recipe Using model: ./hmms/mfcc_16g_11.10.2007_10 Writing.lnafiles in: /home/userk/speaker-diarization/lna Writing.exp` files in: /home/userk/speaker-diarization/exp Processing file 1/1 Input: /mnt/c/users/karthikeyan/Downloads/proper.wav Output: /home/userk/speaker-diarization/lna/proper.lna FAN OUT: 0 nodes, 0 arcs FAN IN: 0 nodes, 0 arcs Prefix tree: 3 nodes, 6 arcs WARNING: No tokens in final nodes. The result will be incomplete. Try increasing beam. Calling voice-detection2.py Reading recipe from: /tmp/initzDxEk1.recipe Reading .exp files from: /home/userk/speaker-diarization/exp Writing output to: /tmp/vadTalccO.recipe Sample rate set to: 125 Minimum speech turn duration: 0.5 seconds Minimum nonspeech between-turns duration: 1.5 seconds Segment before expansion set to: 0.0 seconds Segment end expansion set to: 0.0 seconds Waiting for feacat to end. Calling spk-change-detection.py Reading recipe from: /tmp/vadTalccO.recipe Reading feature files from: /home/userk/speaker-diarization/fea Feature files extension: .fea Writing output to: /tmp/spkcxxYN9G.recipe Conversion rate set to frame rate: 125.0 Using a growing window Deltaws set to: 0.096 seconds Using BIC as distance measure, lambda = 1.0 Window size set to: 1.0 seconds Window step set to: 3.0 seconds Threshold distance: 0.0 Useful metrics for determining the right threshold:

Average between windows distance: -789.417532303 Maximum between windows distance: 35.230502772707496 Minimum between windows distance: -1378.4592347022503 Total windows: 23 Total segments: 2 Average between detected segments distance: 56.7217946043 Maximum between detected segments distance: 56.72179460426196 Minimum between detected segments distance: 56.72179460426196 Total detected speaker changes: 1 Calling spk-clustering.py ('===', '/tmp/spkcxxYN9G.recipe') Reading recipe from: /tmp/spkcxxYN9G.recipe Reading feature files from: /home/userk/speaker-diarization/fea Feature files extension: .fea Writing output to: stdout Conversion rate set to frame rate: 125.0 Using hierarchical clustering Using BIC as distance measure, lambda = 1.3 Threshold distance: 0.0 Maximum speakers: 0 Initial cluster with: 2 speakers Merging: 1 and 2 distance: -2548.5851870160886 Final speakers: 1 Useful metrics for determining the right threshold:

Maximum between segments distance: 0 Minimum between segments distance: -2548.5851870160886 Total segments: 2 Total detected speakers: 1`

from this how can I get the info of 'number of audio segments can be generated with respect to each speaker'. like speaker 1 has around 5 audio segments and the duration (from where to where I should crop the audio) .... and the wav file has two speakers but it shows total detected speakers: 1 ..