ValueError: need at least one array to concatenate

javierrodenas commented 5 years ago

Hi!!! I have the following error training the model :

File ".\ubm.py", line 202, in 
ubm.train()
File ".\ubm.py", line 50, in train
iterations=(1, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 8, 8)
File "C:\Users\jrodenas\Desktop\SpeakerRecognition\Speaker-Recognition\sidekit\mixture.py", line 672, in EM_split
self._init(features_server, feature_list, num_thread)
File "C:\Users\jrodenas\Desktop\SpeakerRecognition\Speaker-Recognition\sidekit\mixture.py", line 629, in _init
features = features_server.stack_features_parallel(feature_list, num_thread=num_thread)
File "C:\Users\jrodenas\Desktop\SpeakerRecognition\Speaker-Recognition\sidekit\features_server.py", line 666, in stack_features_parallel
return numpy.concatenate(output, axis=0)
ValueError: need at least one array to concatenate

I have done the data_init first and it was well created. Then I ran extrac_feature and finally, ubm train. How can I solve that? Thank you in advance!!

yangxiaokang commented 5 years ago

mybe, you can try it on linux

Anwarvic commented 5 years ago

First, I believe that the features_server is empty and didn't load any features. Why is that? The most probable cause is the location containing the features. To make sure everything as expected, do the following:

[ ] First, open the configuration file conf.yaml and tell me the values of these YAML objects outpath and sampling_rate.
[ ] Then, go to the outpath and you should find at least the following folders audio, feat, and task.
[ ] Inside {outpath}/feat, you should find two folders at least enroll and test. I need to know how many files inside each.
[ ] Also, I need to see the code calling ubm.EM_split() in ubm.py.
[ ] Finally, I need to know the line responsible for creating the FeatureServer inside ubm.py. I'm expecting something like this server = self.createFeatureServer("enroll")... is it right?

javierrodenas commented 5 years ago

@Anwarvic first of all, thanks for your answer.

Answering your questions:

[x] As outpath I have _./SpeakerRecognition/Speaker-Recognition/Merged_Arabic_Corpus_of_IsolatedWords/ and as sample_rate I have 44100 (default value).
[x] On the other hand, I find audio, feat and task folders in outpath but inside feat folder I can only see enroll folder but no test folder. Inside the enroll folder there are 6 files:
enroll_idmap.h5
plda_idmap.h5
test_idmap.h5
test_ndx.h5
test_trials.txt
tv_idmap.h5
[x] Also, ubm.EM_split() has the deault structure:

        ubm.EM_split(
            features_server=server, #sidekit.FeaturesServer used to load data
            feature_list=train_list, #list of feature files to train the model
            distrib_nb=self.NUM_GAUSSIANS, #number of Gaussian distributions
            num_thread=self.NUM_THREADS, # number of parallel processes
            save_partial=False, # if False, it only saves the last model
            iterations=(1, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 8, 8)
            )

if __name__ == "__main__":
    conf_filename = "conf.yaml"
    ubm = UBM(conf_filename)
    ubm.train()
    ubm.evaluate()
    ubm.plotDETcurve()
    print( "Accuracy: {}%".format(ubm.getAccuracy()) )

[x] Finally, you are right. The creation of the FeatureServer is done with server = self.createFeatureServer("enroll")

On the other hand, inside the folder of audio I can find data, enroll and test folders but are empty. Beside this, task folder has the same 6 files as feat/enroll.

Thank you in advance.

Anwarvic commented 5 years ago

Now, the problem is that you haven't extracted the features from the data yet. So, follow these steps:

First, download the data from here.
Then, run data_init.py. After running it, you will find two folders has been created at {outpath}. these two files are:
- {outpath}\audio: which will contain two folders at least... enroll and test. Inside each folder you will find audio files that you can listen to.
- {outpath}\task: which will contain these five files that you have mentioned above.
Then, you need to run extract_features.py script which will create another directory in the {outpath} called feat. Inside this folder you should find two other folders at least. They are enroll and test.
After running these two scripts. You can now run ubm.py with no problem.

If you need more information, please check this README.md file as I explained as many details as I could.

TeppieC commented 3 years ago

Hi, maybe it's too late but I believe that you forgot to install sox so the convert_wav() did not work as expected.

Anwarvic / Speaker-Recognition

ValueError: need at least one array to concatenate #5