PrithivirajDamodaran / Gramformer

A framework for detecting, highlighting and correcting grammatical errors on natural language text. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
MIT License
1.5k stars 175 forks source link

How to train Gramformer on non-English languages. #3

Closed StephennFernandes closed 3 years ago

StephennFernandes commented 3 years ago

Hey @PrithivirajDamodaran , Great work on building Gramformer, ive played with it and the results are amazing.

I work on pushing nlp forward in under represented languages, and hence i humbly request you to please tell me how do i train gramformer on non-English sentences ?

I checked out your HuggingFace page 'https://huggingface.co/prithivida/grammar_error_correcter' but coudn't find any resources on how to train gramformer from scratch. If you could help me in training Gramformer on non-English langauages it would really mean a lot to me. Do let me know.

Thanks

PrithivirajDamodaran commented 3 years ago

Grammatical nuances vastly vary for each and every language,so keeping that in mind you have to put together text pairs for each language you want to train in. It depends on how proficient you are in each of the languages.

As for as for training is concerned, it's a simple fine tuning of seq2seq model like T5 or Pegasus. You can Google T5 conditional generation. There are ton of resources and notebooks. Frameworks like Simple transformers and simpleT5 can come in handy.