VoxCeleb

mirror of VoxCeleb dataset - a large-scale speaker identification dataset

THIS IS WORK IN PROGRESS. I would like to have a reproducable way do download mp3 from youtube, trim it and store as delivered by the author of the dataset

This repo contains the download links to the VoxCeleb dataset, described in [1].

VoxCeleb contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube. The dataset is gender balanced, with 55% of the speakers male. The speakers span a wide range of different ethnicities, accents, professions and ages. There are no overlapping identities between development and test sets.

+-------------------+---------+-------+ | | train | test | +===================+=========+=======+ | # of speakers | 1,211 | 40 | +-------------------+---------+-------+ | # of videos | 21,819 | 677 | +-------------------+---------+-------+ | # of utterances | 139,124 | 6,255 | +-------------------+---------+-------+

Nationality Distribution: The nationalities of the speakers in the dataset were obtained by crawling Wikipedia and can be found here. You can also view the distribution in the following graph:

.. image:: ./data/v1/country.png

The list of duplicates (34 videos only in the train set) can be found here.

The train/val/test split used in [1] below for Speaker Identification can be found here.

Models:

Pretrained models from dataset authors for VGGVox - Speaker Identification and Verification [1] can be found here.

Notice:

We are preparing an extended dataset (VoxCeleb2), containing up to 4 times as many speakers and videos.
VoxCeleb2 was originally due to be released in Q4 2017, however it has been delayed to Q1 2018 due to resource constraints.

Publications:

[1] A. Nagrani, J. S. Chung, A. Zisserman - VoxCeleb: a large-scale speaker identification dataset - INTERSPEECH, 2017

[2] Yifan He, Zhang Zhang - Speaker Identication with VoxCeleb DataSet - Stanford students project, 2017

cyrta / voxceleb

readme

VoxCeleb