R1j1t / contextualSpellCheck

✔️Contextual word checker for better suggestions
MIT License
405 stars 56 forks source link

Single multilingual model recommendation? #84

Closed nickchomey closed 1 year ago

nickchomey commented 1 year ago

Is there a single model that you would recommend using to autocorrect misspellings across common languages? Thanks!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

R1j1t commented 1 year ago

Hey @nickchomey, I apologize for the late reply. The default model (bert-base-cased) sounds like a good model, but I know it is predominantly trained on English text. As, my work did not require multilingual sentences, so I have little knowledge to answer your question.

If you have any suggestions, please let me know

nickchomey commented 1 year ago

These might be useful - they use multilingual datasets.

https://huggingface.co/unicamp-dl/mMiniLM-L6-v2-mmarco-v2 https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1

The second is a cross-encoder, which is more accurate than biencoder, but very slow. I'm using it as a search re-ranker successfully. It's not clear to me what the first one is. Not sure if a cross encoder can be used with this tool.