felixbur / nkululeko

Machine learning speaker characteristics
MIT License
26 stars 4 forks source link

Add speechbrain embeddings #73

Closed felixbur closed 9 months ago

felixbur commented 10 months ago

Might be interesting to compare embeddings trained on speakerID

https://huggingface.co/speechbrain/spkrec-xvect-voxceleb

bagustris commented 10 months ago

This will introduce a new dependency, which may be large (since speechbrain tried to tackle all speech processing tasks). There is a similar acoustic feature extractor on transformer (AutoModelForAudioXVector) which works with wavlm-base-plue (microsoft/wavlm-base-plus-sv). The accuracy is also high, see demonstration here: https://huggingface.co/spaces/Bagus/speaker-verification-demo.

felixbur commented 10 months ago

I think it's ok to install the dependencies only if you explictly want to use this model That's how it's done in nkululeko anyway: most of the imports appear only when you trigger the modules

bagustris commented 10 months ago

I will work on this, do you know the small-scale verification dataset to try? Currently, I only have voxceleb which is large.

felixbur commented 10 months ago

actually i thought we can try it on any database, doesn't need to deal with verification, e.g. emodb. I mean, wav2vec2 was developed originally for ASR...

bagustris commented 10 months ago

I used Ravdess for experiment by changing target from emotion to speaker. I also split the train and test to allocate 80% utterances from each speaker for training and the rest 20% for test (see process_database_speaker.py in data/ravdess folder).

So, just change the target from emotion to speaker gives the error below (INI file is also included in ravdess directory).

(nkululeko) bagus@pc-omen:nkululeko$ python3 -m nkululeko.nkululeko --config data/ravdess/exp_speaker.ini 
DEBUG nkululeko: running results/exp_ravdess_speaker from config data/ravdess/exp_speaker.ini, nkululeko version 0.65.9
DEBUG dataset: loading train
DEBUG dataset: value for audio_path not found, using default: 
DEBUG dataset: Loaded database train with 1152 samples: got targets: True, got speakers: True (24), got sexes: True, got age: False
DEBUG dataset: loading test
DEBUG dataset: value for audio_path not found, using default: 
DEBUG dataset: Loaded database test with 288 samples: got targets: True, got speakers: True (24), got sexes: True, got age: False
DEBUG experiment: loaded databases train,test
DEBUG experiment: reusing previously stored ./results/exp_ravdess_speaker/./store/testdf.csv and ./results/exp_ravdess_speaker/./store/traindf.csv
DEBUG experiment: value for filter.sample_selection not found, using default: all
DEBUG experiment: value for type not found, using default: dummy
DEBUG experiment: Categories test: []
DEBUG experiment: Categories train: []
DEBUG experiment: 0 speakers in test and 0 speakers in train
DEBUG nkululeko: train shape : (0, 5), test shape:(0, 5)
DEBUG featureset: value for set not found, using default: eGeMAPSv02
DEBUG featureset: value for level not found, using default: functionals
DEBUG featureset: value for store_format not found, using default: pkl
DEBUG featureset: extracting openSmile features, this might take a while...
Traceback (most recent call last):
  File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 63, in <module>
    main(cwd)  # use this if you want to state the config file path on command line
  File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 50, in main
    expr.extract_feats()
  File "/home/bagus/github/nkululeko/nkululeko/experiment.py", line 313, in extract_feats
    self.feats_train = self.feature_extractor.extract()
  File "/home/bagus/github/nkululeko/nkululeko/feature_extractor.py", line 153, in extract
    self.featExtractor.extract()
  File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feats_opensmile.py", line 45, in extract
    self.df = smile.process_index(self.data_df.index)
  File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/audinterface/core/feature.py", line 573, in process_index
    df = self._series_to_frame(y)
  File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/opensmile/core/smile.py", line 421, in _series_to_frame
    return pd.concat(frames, axis='index')
  File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 372, in concat
    op = _Concatenator(
  File "/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 429, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

The problem could be the speaker labels (again, maybe related to #61). Try both with quotes and without quotes on speaker labels, but the debug says an empty list for categories.

felixbur commented 10 months ago

ok, need to check later