facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.97k stars 1.07k forks source link

"Text to Text Metadata" dataset #48

Closed virgulvirgul closed 1 year ago

virgulvirgul commented 1 year ago

Do you plan to share "Text to Text Metadata" translation dataset?

jmp84 commented 1 year ago

@virgulvirgul there is no new text to text data/metadata, only speech/text and speech/speech metadata. For text to text parallel data used for text to text machine translation training, you can refer to https://arxiv.org/abs/2207.04672 and/or https://opus.nlpl.eu/ and/or https://huggingface.co/datasets/allenai/nllb