Word limit - Githubissues

PrithivirajDamodaran / Gramformer

A framework for detecting, highlighting and correcting grammatical errors on natural language text. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

MIT License

1.5k stars 175 forks source link

Word limit #30

Open Talib6509 opened 1 year ago

Talib6509 commented 1 year ago

The model is having trouble with long sentences. Specially if the words in the sentences are in upper case. It outputs only limited sentence as an output and the rest neglected sentence is shown as error.

grasv commented 1 year ago

I'm also experiencing this issue. Can you please provide guidance on how we can determine the maximum input length of text to pass into the model? @PrithivirajDamodaran

Thank you.

Bachstelze commented 1 year ago

Have you tried normalizing your input text, e.g. with input.capitalize() ? The sentencepiece tokenizer junks rare words in many small parts, especially if they are uppercase and regular not uppercase.