Ignore words shorter than a certain length

bartosz-antosik / vscode-spellright

Multilingual, Offline and Lightweight Spellchecker for Visual Studio Code

Other

360 stars 37 forks source link

Ignore words shorter than a certain length #202

Open gandalfsaxe opened 6 years ago

gandalfsaxe commented 6 years ago

Code Spell Checker employs a nice little trick where words of 3 characters or shorter can be ignored. Could you consider adding such an option?

bartosz-antosik commented 6 years ago

I am proud not to ignore these! Spelling things like I'm, you etc. seemed critical to me!

Why would you consider not spelling these desired?

gandalfsaxe commented 6 years ago

Good point. It's because I sometimes use Spell Right with code documents, so it picks up some false positives there. Perhaps there's a way to do this just for some file types and not for comments?

bartosz-antosik commented 6 years ago

There may be but please show me particular examples. It would be better generalization to start from examples.

gandalfsaxe commented 6 years ago

Some examples from python scripts with comments:

xs  # list of x (in Python it's conventional to put an s at the end of a variable, so a list of `x` is `xs`)
ts  # list of t
py  # y-component of momentum
tol  # tolerance
jit  # @jit is a numba (https://numba.pydata.org) decorator that enables "just in time" compilation, speeding up computations
leo  #  low earth orbit
ang  # angle
plt  # from `import pyplot.matplotlib as plt` (in Python packages are sometimes imported with shorthand names 2-3 characters long)
len  # often used for a length in programming (although i Python, it's a command)

I think it would be useful to be able to disable detections of words with 3 characters or fewer for programming files. In many cases, underscore_case and camelCase variable naming will prevent false positives, but in programming variable names of 2-3 characters are also common and could be avoided with such a feature.

bartosz-antosik commented 6 years ago

Spell Right is supposed to use document's symbols when spelling (please see "spellright.useDocumentSymbolsInCode" for details and notices on symbols in README). It should extract these from the language server (it takes some time and some parsers do not return symbols at first call) but these take some time. Aren't these eliminated when spelling?

I would really like to ask for bit of precision in these examples. This issue seems very general and I would love to have some example I can drop into Spell Right (e.g. create empty document, paste and see what is happening inside). Please!

tecosaur commented 6 years ago

My 2¢ on this

When this is anoying

This 'issue' comes up in latex documents for me, specifically in math blocks where I'm multiplying variables together, much more with two letters than three.

Example:

\begin{align*}
    \Lap[1](s) & = \int_0^\infty e^{-st} f(t) \, \dd t                                                            \\
               & = \int_0^\infty e^{-st} 1 \, \dd t                                                               \\
               & = \lim_{X \to \infty} \int_0^X e^{-st} \, \dd t                                                  \\
               & = \lim_{X \to \infty} {\left[ -\frac{1}{s}e^{-st} \right]}_{t=0}^{t=X}                           \\
               & = \lim_{X \to \infty} \left(\cancelto{0}{-\frac{1}{s}e^{-sX}} + \frac{1}{s}e^{-s\cdot 0} \right) \\
               & = \frac{1}{s}, \; s > 0
\end{align*}

End Result

Solution?

Would it work if all two letter combinations were ignored, but with a list of specific combinations that should not be (e.g. im → I'm). I'd image this could be done with two settings, like so

"spell-right.min-word-length": 2,
"spell-right.min-length-exceptions": ["im", "id"]

acharkq commented 5 years ago

Add the following script to setting.json.

"spellright.ignoreRegExps":[
    "/[a-zA-Z]{1,3}(?<![a-zA-Z]{4})(?![a-zA-Z])/",
]

bartosz-antosik commented 5 years ago

@acharkq That's clever!

jcwinkler commented 4 years ago

@acharkq , @bartosz-antosik: The suggested approach may lead to false catches if e.g. German Umlauts or 3 letter long keywords like \end, are used:

grafik

The following improved expression based on @acharkq's one fixes this:

"spellright.ignoreRegExps":[
        "/[a-zA-ZÀ-ž\u0370-\u03FF\u0400-\u04FF]{1,3}(?<![a-zA-ZÀ-ž\u0370-\u03FF\u0400-\u04FF\\]{4})(?![a-zA-ZÀ-ž\u0370-\u03FF\u0400-\u04FF])/"
    ]