Open shashankack opened 5 months ago
Using TensorFlow backend. /usr/local/lib/python3.6/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) All-Files: ['3.wav'] Processing File: 3.wav Filtering: input/Marathi/3.wav Filtering Complete N-SPeakers: 3 ========= 0 ========= 0:01.408 ==> 0:12.728 0:58.344 ==> 1:05.424 1:10.144 ==> 1:25.32 ========= 1 ========= 0:12.728 ==> 0:58.344 1:05.424 ==> 1:07.384 ========= 2 ========= 1:07.384 ==> 1:10.144 Processing File Complete: 3.wav Diarized Marathi All-Files: ['1.wav', 'Rishi.wav'] Processing File: 1.wav Filtering: input/English/1.wav Filtering Complete N-SPeakers: 1 ========= 0 ========= 0:00.96 ==> 0:18.696 Processing File Complete: 1.wav Processing File: Rishi.wav Filtering: input/English/Rishi.wav Filtering Complete Traceback (most recent call last): File "speechEmotionRecognition.py", line 62, in bk.diarizeFromFolder(f'{INPUT_FOLDER_PATH}{subdir}{"/"}',(f'{OUTPUT_FOLDER_PATH}{subdir}{"/"}')) File "/MevonAI-Speech-Emotion-Recognition/src/bulkDiarize.py", line 29, in diarizeFromFolder diarizeAudio(TOTAL_PATH,TOTAL_OUTPUT_PATH,expectedSpeakers=2) File "/MevonAI-Speech-Emotion-Recognition/src/speakerDiarization.py", line 242, in diarizeAudio main("filterTemp.wav", embedding_per_second=0.6, overlap_rate=0.4,exportFile=exportFile,expectedSpeakers=expectedSpeakers) File "/MevonAI-Speech-Emotion-Recognition/src/speakerDiarization.py", line 170, in main feats = np.array(feats)[:,0,:].astype(float) # [splits, embedding dim] IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
python3 speechEmotionRecognition.py
Using TensorFlow backend. /usr/local/lib/python3.6/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) All-Files: ['3.wav'] Processing File: 3.wav Filtering: input/Marathi/3.wav Filtering Complete N-SPeakers: 3 ========= 0 ========= 0:01.408 ==> 0:12.728 0:58.344 ==> 1:05.424 1:10.144 ==> 1:25.32 ========= 1 ========= 0:12.728 ==> 0:58.344 1:05.424 ==> 1:07.384 ========= 2 ========= 1:07.384 ==> 1:10.144 Processing File Complete: 3.wav Diarized Marathi All-Files: ['1.wav', 'Rishi.wav'] Processing File: 1.wav Filtering: input/English/1.wav Filtering Complete N-SPeakers: 1 ========= 0 ========= 0:00.96 ==> 0:18.696 Processing File Complete: 1.wav Processing File: Rishi.wav Filtering: input/English/Rishi.wav Filtering Complete Traceback (most recent call last): File "speechEmotionRecognition.py", line 62, in
bk.diarizeFromFolder(f'{INPUT_FOLDER_PATH}{subdir}{"/"}',(f'{OUTPUT_FOLDER_PATH}{subdir}{"/"}'))
File "/MevonAI-Speech-Emotion-Recognition/src/bulkDiarize.py", line 29, in diarizeFromFolder
diarizeAudio(TOTAL_PATH,TOTAL_OUTPUT_PATH,expectedSpeakers=2)
File "/MevonAI-Speech-Emotion-Recognition/src/speakerDiarization.py", line 242, in diarizeAudio
main("filterTemp.wav", embedding_per_second=0.6, overlap_rate=0.4,exportFile=exportFile,expectedSpeakers=expectedSpeakers)
File "/MevonAI-Speech-Emotion-Recognition/src/speakerDiarization.py", line 170, in main
feats = np.array(feats)[:,0,:].astype(float) # [splits, embedding dim]
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed