KAIST-AILab / SyncVSR

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization (Interspeech 2024)
https://www.isca-archive.org/interspeech_2024/ahn24_interspeech.pdf
MIT License
14 stars 1 forks source link

Chinese support (or multi-langual) #2

Closed MonolithFoundation closed 2 weeks ago

MonolithFoundation commented 1 month ago

Maybe many language are hard or even not possiable, but English and Mandrain support could be far more useable in VSR field.

Do u have any thoughts on this

snoop2head commented 2 weeks ago

@MonolithFoundation I think Chinese language is inherently more difficult for VSR due to the prevalence of homophenes of the existence stemmed from 拼音 (pinyin).

Currently, my server devices that were used to train VSR models were taken away (about 6 months ago). For now, I can't train the any other models, but if there's a chance to access any Chinese dataset with hardware devices, I may support it.