facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.93k stars 1.06k forks source link

T2TT: zsm(Standard Malay) #85

Closed Akira4ever closed 1 year ago

Akira4ever commented 1 year ago

Hi, when I run:
m4t_predict <input_text> t2tt <tgt_lang> --src_lang zsm, I get: ValueError: 'lang' must be a supported language, but is 'zsm' instead.

So, I checked self.langs in translator.text_tokenizer, which is NllbTokenizer, and got: {'uzn', 'swe', 'eng', 'kir', 'npi', 'pan', 'mai', 'mni', 'hin', 'som', 'yue', 'heb', 'ita', 'asm', 'slk', 'ckb', 'arb', 'cmn', 'jpn', 'luo', 'mlt', 'khm', 'cat', 'fra', 'amh', 'nob', 'est', 'ibo', 'bos', 'sna', 'deu', 'khk', 'snd', 'tgk', 'lvs', 'glg', 'zlm', 'guj', 'fin', 'ell', 'lao', 'zul', 'pes', 'tel', 'ces', 'swh', 'jav', 'gle', 'ory', 'por', 'slv', 'sat', 'vie', 'tgl', 'hun', 'mar', 'hrv', 'mal', 'tam', 'spa', 'eus', 'arz', 'afr', 'ary', 'kaz', 'kor', 'bul', 'fuv', 'kan', 'yor', 'tur', 'ron', 'ind', 'pbt', 'cmn_Hant', 'mya', 'mkd', 'ceb', 'pol', 'ukr', 'bel', 'nno', 'srp', 'dan', 'lit', 'urd', 'rus', 'gaz', 'kat', 'hye', 'nld', 'tha', 'cym', 'ben', 'nya', 'azj', 'lug', 'isl'} without 'zsm'.

And when I use T2TT on https://huggingface.co/spaces/facebook/seamless_m4t, if I set 'Standard Malay', it will return error.

Is there anything wrong?

elbayadm commented 1 year ago

Hi @Akira4ever, we previously used the code zlm for Standard Malay. It's now fixed d31144406acf1eff2505ccfca907d48a535954c6 and zsm should work.