bshall / knn-vc

Voice Conversion With Just Nearest Neighbors
https://bshall.github.io/knn-vc/
Other
431 stars 64 forks source link

Matching pool empty #32

Closed asusdisciple closed 8 months ago

asusdisciple commented 8 months ago

I sometimes experience a bug when performing matching with big datasets (20k samples+).

This is the Stacktrace:

Feature has shape:  torch.Size([445, 1024])---------------------------------------------------------| 0.02% [1/5293 00:10<15:54:29 train_clean2/102/102-83.flac]
Feature has shape:  torch.Size([400, 1024])---------------------------------------------------------| 0.04% [2/5293 00:15<11:44:33 train_clean2/102/102-60.flac]
Done 1,000/5,293████-------------------------------------------------------------------------------| 18.89% [1000/5293 05:40<24:20 train_clean2/14/14-55.flac]c]
Traceback (most recent call last):-----------------------------------------------------------------| 21.56% [1141/5293 06:36<24:02 train_clean2/15/15-5.flac]]]
  File "/raid/nils/projects/knn-vc/prematch_dataset.py", line 172, in <module>
    main(args)
  File "/raid/nils/projects/knn-vc/prematch_dataset.py", line 51, in main
    extract(ls_df, wavlm, args.device, Path(args.librispeech_path), Path(args.out_path), SYNTH_WEIGHTINGS, MATCH_WEIGHTINGS)
  File "/raid/nils/projects/knn-vc/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/raid/nils/projects/knn-vc/prematch_dataset.py", line 128, in extract
    matching_pool, synth_pool = path2pools(row.path, wavlm, match_weights, synth_weights, device)
  File "/raid/nils/projects/knn-vc/prematch_dataset.py", line 75, in path2pools
    matching_pool = torch.concat(matching_pool, dim=0)
RuntimeError: torch.cat(): expected a non-empty list of Tensors

However the problem seems not to be the file itself. When I change the matching directory to train_clean2/15 the algorithm runs through without any problems. I used the German Distant Speech Data Corpus 2014 / 2015 to run this experiment. I wonder what the root of this error might be, I had directories of 5000+ files run through without a problem but sometimes this bug still appears. For some reason it seems not to find a matching vector.

RF5 commented 8 months ago

Hi @asusdisciple , the idea of prematching requires multiple utterances from a taregt speaker. It is likely that your dataset has a speaker which has only 1 utterance, which is what can cause this error. To construct a prematched dataset, each utterance must have other utterances from the same speaker, so if there is a speaker with only one utterance (which is what I suspect in your German Distant Speech Data Corpus), then it will break when it gets to that speaker.

I recommend filtering your dataset to only contain speakers having more than one utterance. If that isn't the issue, I am not too sure what the cause is. I hope that helps!

asusdisciple commented 8 months ago

Thanks this solved the issue! I had the impression that prematching was only done during inference to calculate the distance between the target speaker and the source material in terms of vectors. However I could solve this problem with your advice!