Open imKarthikeyanK opened 5 years ago
Do you have the docopt
dependency, and are you using python2?
yeah.. Thankyou. I was missing docopt
dependency. Now Im getting this result...
userk@PSSHSRDT034:~/speaker-diarization$ python spk-diarization2.py /mnt/c/users/karthikeyan/Downloads/proper.wav Reading file: /mnt/c/users/karthikeyan/Downloads/proper.wav Writing output to: stdout Using feacat from: /home/userk/speaker-diarization/feacat Writing temporal files in: /tmp Writing lna files in: /home/userk/speaker-diarization/lna Writing exp files in: /home/userk/speaker-diarization/exp Writing features in: /home/userk/speaker-diarization/fea Performing exp generation and feacat concurrently tokenpass: ./VAD/tokenpass/test_token_pass Reading recipe: /tmp/initzDxEk1.recipe Using model: ./hmms/mfcc_16g_11.10.2007_10 Writing
.lnafiles in: /home/userk/speaker-diarization/lna Writing
.exp` files in: /home/userk/speaker-diarization/exp
Processing file 1/1
Input: /mnt/c/users/karthikeyan/Downloads/proper.wav
Output: /home/userk/speaker-diarization/lna/proper.lna
FAN OUT: 0 nodes, 0 arcs
FAN IN: 0 nodes, 0 arcs
Prefix tree: 3 nodes, 6 arcs
WARNING: No tokens in final nodes. The result will be incomplete. Try increasing beam.
Calling voice-detection2.py
Reading recipe from: /tmp/initzDxEk1.recipe
Reading .exp files from: /home/userk/speaker-diarization/exp
Writing output to: /tmp/vadTalccO.recipe
Sample rate set to: 125
Minimum speech turn duration: 0.5 seconds
Minimum nonspeech between-turns duration: 1.5 seconds
Segment before expansion set to: 0.0 seconds
Segment end expansion set to: 0.0 seconds
Waiting for feacat to end.
Calling spk-change-detection.py
Reading recipe from: /tmp/vadTalccO.recipe
Reading feature files from: /home/userk/speaker-diarization/fea
Feature files extension: .fea
Writing output to: /tmp/spkcxxYN9G.recipe
Conversion rate set to frame rate: 125.0
Using a growing window
Deltaws set to: 0.096 seconds
Using BIC as distance measure, lambda = 1.0
Window size set to: 1.0 seconds
Window step set to: 3.0 seconds
Threshold distance: 0.0
Useful metrics for determining the right threshold:Maximum between segments distance: 0 Minimum between segments distance: -2548.5851870160886 Total segments: 2 Total detected speakers: 1`
from this how can I get the info of 'number of audio segments can be generated with respect to each speaker'. like speaker 1
has around 5 audio segments and the duration (from where to where I should crop the audio) .... and the wav file has two speakers but it shows total detected speakers: 1 ..
While trying to execute the below command ..
python spk-diarization2.py /mnt/c/users/karthikeyan/Downloads/proper.wav
am getting,
Reading file: /mnt/c/users/karthikeyan/Downloads/proper.wav Writing output to: stdout Using feacat from: /home/userk/speaker-diarization/feacat Writing temporal files in: /tmp Writing lna files in: /home/userk/speaker-diarization/lna Writing exp files in: /home/userk/speaker-diarization/exp Writing features in: /home/userk/speaker-diarization/fea Performing exp generation and feacat concurrently Traceback (most recent call last): File "./generate_exp.py", line 37, in
from docopt import docopt
ImportError: No module named docopt
Calling voice-detection2.py
Reading recipe from: /tmp/initrypiaG.recipe
Reading .exp files from: /home/userk/speaker-diarization/exp
Writing output to: /tmp/vadHJVgzE.recipe
Sample rate set to: 125
Minimum speech turn duration: 0.5 seconds
Minimum nonspeech between-turns duration: 1.5 seconds
Segment before expansion set to: 0.0 seconds
Segment end expansion set to: 0.0 seconds
Error, /home/userk/speaker-diarization/exp/proper.exp does not exist
Waiting for feacat to end.
Calling spk-change-detection.py
Reading recipe from: /tmp/vadHJVgzE.recipe
Reading feature files from: /home/userk/speaker-diarization/fea
Feature files extension: .fea
Writing output to: /tmp/spkcM3EdlF.recipe
Conversion rate set to frame rate: 125.0
Using a growing window
Deltaws set to: 0.096 seconds
Using BIC as distance measure, lambda = 1.0
Window size set to: 1.0 seconds
Window step set to: 3.0 seconds
Threshold distance: 0.0
Useful metrics for determining the right threshold:
Maximum between windows distance: 0 Total windows: 0 Total segments: 0 Maximum between detected segments distance: 0 Total detected speaker changes: 0 Calling spk-clustering.py ('===', '/tmp/spkcM3EdlF.recipe') Reading recipe from: /tmp/spkcM3EdlF.recipe Reading feature files from: /home/userk/speaker-diarization/fea Feature files extension: .fea Writing output to: stdout Conversion rate set to frame rate: 125.0 Using hierarchical clustering Using BIC as distance measure, lambda = 1.3 Threshold distance: 0.0 Maximum speakers: 0 ('::::::::::::::::::::::::::::::::::', 0) Initial cluster with: 0 speakers Traceback (most recent call last): File "./spk-clustering.py", line 432, in
process_recipe(parsed_recipe, speakers, outf)
File "./spk-clustering.py", line 293, in process_recipe
spk_cluster_m(feas[1], recipe, speakers, outf, dist, segf)
UnboundLocalError: local variable 'feas' referenced before assignment
I tried looking into
spk-clustering.py
. thelen(receipe)
andfeas
values are 0.... thank you,