Closed asusdisciple closed 8 months ago
Hi @asusdisciple , the idea of prematching requires multiple utterances from a taregt speaker. It is likely that your dataset has a speaker which has only 1 utterance, which is what can cause this error. To construct a prematched dataset, each utterance must have other utterances from the same speaker, so if there is a speaker with only one utterance (which is what I suspect in your German Distant Speech Data Corpus), then it will break when it gets to that speaker.
I recommend filtering your dataset to only contain speakers having more than one utterance. If that isn't the issue, I am not too sure what the cause is. I hope that helps!
Thanks this solved the issue! I had the impression that prematching was only done during inference to calculate the distance between the target speaker and the source material in terms of vectors. However I could solve this problem with your advice!
I sometimes experience a bug when performing matching with big datasets (20k samples+).
This is the Stacktrace:
However the problem seems not to be the file itself. When I change the matching directory to
train_clean2/15
the algorithm runs through without any problems. I used the German Distant Speech Data Corpus 2014 / 2015 to run this experiment. I wonder what the root of this error might be, I had directories of 5000+ files run through without a problem but sometimes this bug still appears. For some reason it seems not to find a matching vector.