Open kyakuno opened 3 years ago
transformersからonnxへの変換の例は下記。ただし、まだwav2vec2が変換できるかは不明。 https://github.com/axinc-ai/bert-japanese-onnx
なかなか苦労していそう。 https://github.com/pytorch/fairseq/issues/3010
こちらのモデルには日本語も含まれていそう。xlsr_53_56k.ptだけで3.5GBある。 https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Model | Architecture | Hours | Languages | Datasets | Model |
---|---|---|---|---|---|
XLSR-53 | Large | 56k | 53 | MLS, CommonVoice, BABEL | download |
The XLSR model uses the following datasets for multilingual pretraining:
MLS: Multilingual LibriSpeech (8 languages, 50.7k hours): Dutch, English, French, German, Italian, Polish, Portuguese, Spanish
CommonVoice (36 languages, 3.6k hours): Arabic, Basque, Breton, Chinese (CN), Chinese (HK), Chinese (TW), Chuvash, Dhivehi, Dutch, English, Esperanto, Estonian, French, German, Hakh-Chin, Indonesian, Interlingua, Irish, Italian, Japanese, Kabyle, Kinyarwanda, Kyrgyz, Latvian, Mongolian, Persian, Portuguese, Russian, Sakha, Slovenian, Spanish, Swedish, Tamil, Tatar, Turkish, Welsh (see also finetuning splits from this paper).
Babel (17 languages, 1.7k hours): Assamese, Bengali, Cantonese, Cebuano, Georgian, Haitian, Kazakh, Kurmanji, Lao, Pashto, Swahili, Tagalog, Tamil, Tok, Turkish, Vietnamese, Zulu
CommonVoiceデータセットというのがあるみたい。日本語は306Speaker、Englishは69610Speaker。 https://commonvoice.mozilla.org/en/languages
試していないけどonnxへの変換は動くはずだよと書かれている。 https://github.com/pytorch/fairseq/issues/2972
https://huggingface.co/transformers/model_doc/wav2vec2.html SpeechRecognition using BERT