aclew / DiViMe

ACLEW Diarization Virtual Machine - ARCHIVE -- visit github.com/srvk/DiViMe/releases for the latest version
Apache License 2.0
7 stars 1 forks source link

Cannot find my wav file #7

Closed yaklamtse closed 6 years ago

yaklamtse commented 6 years ago

Hi, I was trying to use my own wav files to detect (non)speech. However, it gives me this:

$ vagrant ssh -c "tools/noisemes_sad.sh data/" folder is not empty! Tests finished extracting features for speech activity detection detecting speech and non speech segments Loading neural network... finished detecting speech and non speech segments ls: cannot access /vagrant/data//hyp_sum/*.lab: No such file or directory Connection to 127.0.0.1 closed.

However, when I put the test2 wav file in the data folder (with my own wav) it did pick the test2 and produced a rrtm output

riebling commented 6 years ago

Some of the scripts are picky about folders. If you're vagrant sshed into the VM, files in /vagrant are mapped from the host computer's working directory (where you ran vagrant up). Some of the processing scripts assume your data is in /vagrant/... - and when you give an argument that is the pathname, it is the '...' part: the name of the folder in data/.

We should probably be more explicit about assumptions about data folder names!

yaklamtse commented 6 years ago

I am not really sure what you mean. Do you propose including the name of my wav file in the command? Because that did not work either.

riebling commented 6 years ago

First I don't know what command you ran :) That will help go a long way to understanding the messages. There are commands to process an entire folder of WAVs. If you run that command but only give it a single WAV, it won't work. I was talking about one such command (still not sure which you ran), and how it assumes that the folder name you give it as an argument - resides in /vagrant. Typically people put WAV files in /vagrant/data and so the argument they give to the command is simply "data" and the pathname concatenation is done by the script.

You could always look at the script to see what it's doing :)

yaklamtse commented 6 years ago

Hi! I found out why the software could not find my files, because they were WAV files instead of wav. But I do have some other questions haha :)

First, I used some own files to detect speech with the command: vagrant ssh -c tools/noisemes_sad.sh data/". However, my output says there is always speech occuring, whereas my recording does have some pauses. Or am I interpreting this the wrong way? SPEAKER rec_17s 1 0.1 1 <NA> <NA> speech <NA> SPEAKER rec_17s 1 4.5 5 <NA> <NA> speech <NA> SPEAKER rec_17s 1 13.9 3 <NA> <NA> speech <NA>

Secondly, when using the same command on another larger audio (33 minutes) it returns this:

Extracting features for G14161121.AUDIO.wav ...
(MSG) [2] in SMILExtract : openSMILE starting!
(MSG) [2] in SMILExtract : config file is: /vagrant/MED_2s_100ms_htk.conf
(MSG) [2] in cComponentManager : successfully registered 95 component types.
(MSG) [2] in cComponentManager : successfully finished createInstances
                                 (19 component instances were finalised, 1 data memories were finalised)
(MSG) [2] in cComponentManager : starting single thread processing loop
(MSG) [2] in cComponentManager : Processing finished! System ran for 197798 ticks.
DONE!
detecting speech and non speech segments
Loading neural network...
Filename G14161121.AUDIO.htk
Predicting for G14161121.AUDIO ...
Traceback (most recent call last):
  File "SSSF/code/predict/1-confidence-vm5.py", line 59, in <module>
    feature = pca(readHtk(os.path.join(INPUT_DIR, filename))).astype('float32')
  File "/home/vagrant/G/coconut/fileutils/htk.py", line 16, in readHtk
    data = struct.unpack(">%df" % (nSamples * sampSize / 4), f.read(nSamples * sampSize))
MemoryError

I checked my space:

              total        used        free      shared  buff/cache   available
Mem:           3,7G        3,0G        203M        259M        584M        250M
Swap:           14G        729M         14G

Lastly, I was wondering how I can get the noisemes tool classifiy into the 17 different classes. And then from the different speech activity to classify with the DiarTK tool who exactly is speaking.

Thanks in advance!

riebling commented 6 years ago

Aha! the old capitalized vs. lower-case filename problem! (I think Windows doesn't care?)

It's true some speech activity detectors & diairzation tools don't recognize gaps, and label everything as "something" but fail to label silence. It's also true that, for some reason, the noisemes tool just blows up (I've seen it blow up in 3 different ways) - and also that adding more RAM to the VM configuration may/not help. (Much as I've tried to get the developer to fix these crashes, no insight yet) At first we thought it might be related to the length of the input audio, and it is true that given a long enough recording, things will start to fail. So one test might be to split into smaller pieces (if possible) and see what happens.

Lastly, the noisemes tool does support the different classes, and the run script used to support this, but for ACLEW purposes, we changed it to merge classes into only speech and nonspeech. I've added a script runClasses.sh which outputs class labels, according to the file noisemeclasses.txt, taking as input a FOLDER (because loading the nnet model takes a while, it gets loaded once, then files are processed in batch, for better performance)

More info: https://github.com/srvk/DiViMe#noisemes_sad

yaklamtse commented 6 years ago

Hi! I figured out that cutting my audio in segments of 7 min will prevent the VM from crashing.

So if I get the noisemes tool working I can use DiarTK to annotate who is speaking?

Furthermore, I tried to run the runClasses.script by putting it along the Vagrant file and running it on the data folder.

yaklam@yaklam-X550LD:~/DiViMe$ ./runClasses.sh /home/yaklam/DiViMe/dataExtracting features for 01-G14161121112531-001.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Extracting features for 02-G14161121112531-002.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Extracting features for 03-G14161121112531-003.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Extracting features for 04-G14161121112531-004.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Extracting features for 05-G14161121112531-005.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Extracting features for G14161121112531.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Extracting features for HR_3_G14161121112531.wav ...
SSSF/code/feature/extract-htk-vm2.sh: line 26: /home/yaklam/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract: No such file or directory
DONE!
Traceback (most recent call last):
  File "SSSF/code/predict/1-confidence-vm3.py", line 21, in <module>
    from RNN import RNN
  File "/home/yaklam/OpenSAT/SSSF/code/predict/RNN/RNN.py", line 2, in <module>
    import theano, theano.tensor as T
ImportError: No module named theano

I am not sure not sure how to access the openSMILE and place it in DiViMe folder. Do you know where I can find it? :)

riebling commented 6 years ago

This is the problem with using '~' to point scripts to home directories, but then running them from outside the virtual machine, where it refers to your user ID (yaklam) instead of the virtual machine default user ID (vagrant). The runClasses.sh script doesn't work when you run it on the host computer, only within the VM, so I believe the syntax you'd want to use is:
from outside the VM:

vagrant ssh -c "OpenSAT/runClasses.sh /vagrant/dataExtracting"

from within the VM:

cd OpenSAT
./runClasses.sh /vagrant/dataExtracting

I committed a change to runClasses.sh to fix python library paths (the bane of all Python code!) so you might need to git pull in the VM, e.g:

vagrant ssh
cd OpenSAT
git pull
exit

I'm not sure you need the speech activity detection in order to run DiarTK (which appears in the unfortunately named ib_diarization_toolkit folder in the VM) but yes, it will do speaker ID. If one of your input files is named AMI_20050204-1206, for example (the example), speaker IDs will be of the form AMI_20050204-1206_spkr_5

yaklamtse commented 6 years ago

from outside the VM it says:

yaklam@yaklam-X550LD:~/DiViMe$ vagrant ssh -c "OpenSAT/runClasses.sh /vagrant/dataExtracting" 
ls: cannot access /vagrant/dataExtracting/*.wav: No such file or directory
Loading neural network...
Connection to 127.0.0.1 closed.

and inside using first vagrant ssh:

vagrant@vagrant-ubuntu-trusty-64:~$ cd OpenSAT
vagrant@vagrant-ubuntu-trusty-64:~/OpenSAT$ ./runClasses.sh /vagrant/dataExtracting
ls: cannot access /vagrant/dataExtracting/*.wav: No such file or directory
Loading neural network...

And, I thought that Diartk used the annotations made by the speech activity tool to determine who is talking?

Is it actually possible to give other labels instead of AMI_20050204-1206_spkr_5, for instance mother? Thanks! :)

riebling commented 6 years ago

DiarTK is not smart enough, nor trained to recognize labels as 'mother' 'CHI' etc. It is confusing to me that some tools in the VM require SAD before doing diarization, others do both on their own.

The tool you might want to try in order to get labels like child/male/female is Yunitator. It was trained on, and produces such labels, and does not require speech activity detection as input.

As for the "No such file or directory" errors - are there .wav files in dataExtracting/? (and not .WAV files :) ) What happens if all you do is ls dataExtracting/*.wav just as a sanity check that the folder and data are there?

The runClasses.sh I pointed you to is running the noisemes SAD (in the OpenSAT folder of the VM) not DiarTK. I added the runClasses.sh script so you could see class labels from the list of 19 it produces, rather than having it produce only speech/nonspeech output when run in other ways.

So I think we're getting confused about which tools we mean, and what we wish them to produce. I see now what you mean, DiarTK does require SAD as input first. I mistakenly thought we were talking about OpenSAT noisemes tool, which is what runClasses runs.

DiarTK produces speaker labels like FEE029, FEE030, FEE031, MEE032 which I assume encode female and male, and the numbers represent different distinct speakers. but these are not documented within the tool (that I can find quickly - maybe they're in the source code?), and I can't be of much help in deciphering them. Perhaps someone else on the ACLEW team knows better than me!

yaklamtse commented 6 years ago

Oh haha, my bad. I did not put the data in the dataExtracting file hence no data found. So, now it is working!

Thank you for your help! I think I have everything to proceed, except one thing. Do you maybe have an idea how to convert the RTTM files to CSV, so I can use it in Python? I cannot seem to find it online. Thanks (:

riebling commented 6 years ago

I just googled, apparently the same Python that reads comma separated values can also handle tab separated, which is what RTTM is. (though I'm not sure if some RTTM are space separated) - but that's a programming problem that shouldn't be too hard to do in Python :)

yaklamtse commented 6 years ago

yes, you were right, opened easily :) I have interpretation regarding the DiartTK tool. Because I separated my audio in section of 7 minutes, I have ran the DiarTK as well on these files. Can I assume that spkr_0 is a different spkr_0 in another 7 min audio file? Or this is not the case, because only annotation have been made for speaker 0, 3 and 6?

SPEAKER 02-G14161121112531-002 1 16.00 0.42 <NA> <NA> 02-G14161121112531-002_spkr_0 <NA>
SPEAKER 02-G14161121112531-002 1 16.42 0.59 <NA> <NA> 02-G14161121112531-002_spkr_3 <NA>
SPEAKER 02-G14161121112531-002 1 95.00 1.01 <NA> <NA> 02-G14161121112531-002_spkr_3 <NA>
SPEAKER 02-G14161121112531-002 1 114.00 1.01 <NA> <NA> 02-G14161121112531-002_spkr_3 <NA>
SPEAKER 02-G14161121112531-002 1 173.00 0.53 <NA> <NA> 02-G14161121112531-002_spkr_3 <NA>
SPEAKER 02-G14161121112531-002 1 173.53 0.48 <NA> <NA> 02-G14161121112531-002_spkr_6 <NA>
SPEAKER 02-G14161121112531-002 1 246.00 1.01 <NA> <NA> 02-G14161121112531-002_spkr_6 <NA>
SPEAKER 02-G14161121112531-002 1 410.00 1.01 <NA> <NA> 02-G14161121112531-002_spkr_6 <NA>
riebling commented 6 years ago

Good point; spkr_0 would not be consistent across separate runs of DiarTK. A good case for trying to make it handle longer audio files!

alecristia commented 6 years ago

I believe this issue is closed. Please reopen if it is not.