R1j1t / contextualSpellCheck

✔️Contextual word checker for better suggestions
MIT License
405 stars 56 forks source link

French (doc add) #83

Open EtienneAb3d opened 2 years ago

EtienneAb3d commented 2 years ago

As requested in #41, here is how I succeeded in running contextualSpellCheck for French.

Use French spaCy model:

nlp = spacy.load("fr_core_news_sm")

Use camembert/camembert-base-ccnet:

nlp.add_pipe("contextual spellchecker", config={"max_edit_dist": 4,"model_name": "camembert/camembert-base-ccnet"})

Need these dependencies:

pip install sentencepiece
pip install protobuf==3.20

Remark: on the result spaces are lost, thus need a post-processing to get them back properly.

PS: for flaubert/flaubert_large_cased model, need this dependency

pip install sacremoses
R1j1t commented 2 years ago

Hey, @EtienneAb3d thank you for raising this request. It is excellent to know you were successfully able to use it for french!

Would you like to raise a PR to add an example for the french language similar to other examples? I would be happy to merge the PR as it would be a great addition for people using it for french!

If you have any suggestions or other feedback, feel free to highlight them.

EtienneAb3d commented 2 years ago

Hi @R1j1t, perhaps later I will find the time to build such a PR. But, on the team side, if you have a direct access to edit, it's only few lines to add to the doc. ;-)

R1j1t commented 2 years ago

No worries!

mtx-z commented 4 months ago

Also note that in addition to @EtienneAb3d steps, in a Jupyer Notebook: restart kernel after protobuf install

!pip uninstall -y protobuf
!pip install protobuf==3.20

Also @EtienneAb3d , how did you manage the lost spaces issue?