R1j1t / contextualSpellCheck

✔️Contextual word checker for better suggestions
MIT License
405 stars 56 forks source link

Methodology for spell check #67

Closed dsvrsec closed 2 years ago

dsvrsec commented 2 years ago

Can you please mention the methodology followed in detection and correction of mispelled words,may be at higher level,if possible, thank you

R1j1t commented 2 years ago

Hey @dsvrsec, I have mentioned this in my comments here: https://github.com/R1j1t/contextualSpellCheck/issues/59#issuecomment-811322409

The current logic for spelling correction is as follows:

  1. provide spacy model: This will break the sentence into tokens. Now as this model is trained on a particular language (tweet specific models are also there) it knows the nuances

  2. Check the token in the transformer model's vocab: If the token is not present consider it spelling error

  3. Mask the OOV word and use the transformers model to predict words to replace mask

  4. check the edit distance to see which is closest syntactically.

The default model (bert-base-cased)

Please let me know if you have any questions, also feel free to contribute!!

muntasir2000 commented 2 years ago

@R1j1t So it can work for any language if a spacy and a BERT LM is available right?

Xiaoping777 commented 2 years ago

Hi here, actually I compared the output with pyspellchecker package, which is purely based on the edit distance, the results is almost same. Just wondering, if there are some grammar correct included is the bert model in use?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.