Closed nguyenvulong closed 1 year ago
Hi @nguyenvulong, thanks for the message.
You know that ASVspoof means ASV and spoof.
Have you checked the file list for ASV? The "missing numbers" are the files listed in LA/ASVspoof2019_LA_asv_protocols/.eval..trn.txt. They are for ASV enrollment and not used for spoofing CM.
$: cd LA/ASVspoof2019_LA_asv_protocols
$: cat *.dev.*.trn.txt | awk '{print $2}' | tr ',' '\n' | wc -l
142
$: cat *.eval.*.trn.txt | awk '{print $2}' | tr ',' '\n' | wc -l
696
24986 - 24844 = 142
71933 - 71237 = 696
The training set does not have ASV enrollment files.
Very nice. Thanks for your hard work. 👏
It has been 4 years and I hope that someone would realize this too: the line counts listed in
cm_protocols
do not match with the number of.flac
files (indev
andeval
) sub-datasets. Please see the screenshot below 👇(while I'm fine with data missing, my bigger concern is that: did this inconsistency cause any labeling issue, e.g., audio x is spoofed instead of bona fide because of this. I hope not)