Closed felixbur closed 9 months ago
This will introduce a new dependency, which may be large (since speechbrain tried to tackle all speech processing tasks). There is a similar acoustic feature extractor on transformer (AutoModelForAudioXVector
) which works with wavlm-base-plue (microsoft/wavlm-base-plus-sv
). The accuracy is also high, see demonstration here: https://huggingface.co/spaces/Bagus/speaker-verification-demo.
I think it's ok to install the dependencies only if you explictly want to use this model That's how it's done in nkululeko anyway: most of the imports appear only when you trigger the modules
I will work on this, do you know the small-scale verification dataset to try? Currently, I only have voxceleb which is large.
actually i thought we can try it on any database, doesn't need to deal with verification, e.g. emodb. I mean, wav2vec2 was developed originally for ASR...
I used Ravdess for experiment by changing target from emotion to speaker. I also split the train and test to allocate 80% utterances from each speaker for training and the rest 20% for test (see process_database_speaker.py
in data/ravdess folder).
So, just change the target from emotion
to speaker
gives the error below (INI file is also included in ravdess directory).
(nkululeko) bagus@pc-omen:nkululeko$ python3 -m nkululeko.nkululeko --config data/ravdess/exp_speaker.ini
DEBUG nkululeko: running results/exp_ravdess_speaker from config data/ravdess/exp_speaker.ini, nkululeko version 0.65.9
DEBUG dataset: loading train
DEBUG dataset: value for audio_path not found, using default:
DEBUG dataset: Loaded database train with 1152 samples: got targets: True, got speakers: True (24), got sexes: True, got age: False
DEBUG dataset: loading test
DEBUG dataset: value for audio_path not found, using default:
DEBUG dataset: Loaded database test with 288 samples: got targets: True, got speakers: True (24), got sexes: True, got age: False
DEBUG experiment: loaded databases train,test
DEBUG experiment: reusing previously stored ./results/exp_ravdess_speaker/./store/testdf.csv and ./results/exp_ravdess_speaker/./store/traindf.csv
DEBUG experiment: value for filter.sample_selection not found, using default: all
DEBUG experiment: value for type not found, using default: dummy
DEBUG experiment: Categories test: []
DEBUG experiment: Categories train: []
DEBUG experiment: 0 speakers in test and 0 speakers in train
DEBUG nkululeko: train shape : (0, 5), test shape:(0, 5)
DEBUG featureset: value for set not found, using default: eGeMAPSv02
DEBUG featureset: value for level not found, using default: functionals
DEBUG featureset: value for store_format not found, using default: pkl
DEBUG featureset: extracting openSmile features, this might take a while...
Traceback (most recent call last):
File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 63, in <module>
main(cwd) # use this if you want to state the config file path on command line
File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 50, in main
expr.extract_feats()
File "/home/bagus/github/nkululeko/nkululeko/experiment.py", line 313, in extract_feats
self.feats_train = self.feature_extractor.extract()
File "/home/bagus/github/nkululeko/nkululeko/feature_extractor.py", line 153, in extract
self.featExtractor.extract()
File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feats_opensmile.py", line 45, in extract
self.df = smile.process_index(self.data_df.index)
File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/audinterface/core/feature.py", line 573, in process_index
df = self._series_to_frame(y)
File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/opensmile/core/smile.py", line 421, in _series_to_frame
return pd.concat(frames, axis='index')
File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 372, in concat
op = _Concatenator(
File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 429, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
The problem could be the speaker labels (again, maybe related to #61). Try both with quotes and without quotes on speaker labels, but the debug says an empty list for categories.
ok, need to check later
Might be interesting to compare embeddings trained on speakerID
https://huggingface.co/speechbrain/spkrec-xvect-voxceleb