google-research-datasets / cvss

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
Creative Commons Attribution 4.0 International
183 stars 14 forks source link

How can i get a correspondance between english source and translated french speech. I can find synchronised file that clearly shows how the files, that when i down load the 'fr' folder in the git repo it downloads an english corpus even though it is named 'fr'. #1

Open Ntchinda-Giscard opened 1 year ago

chaiko commented 1 year ago

The audio files released in CVSS are the English translations (i.e. outputs for S2ST). The corresponding inputs (e.g. French) should be find from Common Voice release version 4 (it's required to be v4). The data from the two corpora can be paired by the audio file names.

Please refer to https://github.com/google-research-datasets/cvss#getting-the-data

Ntchinda-Giscard commented 1 year ago

Ok thanks for the clarification. I will download from common voice corpus 4

On Wed, Apr 26, 2023, 6:54 PM Ye Jia @.***> wrote:

The audio files released in CVSS are the English translations (i.e. outputs for S2ST). The corresponding inputs (e.g. French) should be find from Common Voice release version 4 (it's required to be v4), paired by the audio file names.

Please refer to https://github.com/google-research-datasets/cvss#getting-the-data

— Reply to this email directly, view it on GitHub https://github.com/google-research-datasets/cvss/issues/1#issuecomment-1523828413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMRQFD6PJDF7CVZDEIF7MY3XDFONVANCNFSM6AAAAAAXMWGE5A . You are receiving this because you authored the thread.Message ID: @.***>