errata-ai / vale

:pencil: A markup-aware linter for prose built with speed and extensibility in mind.
https://vale.sh
MIT License
4.44k stars 152 forks source link

Make Vale's format support extensible #769

Open jdkato opened 7 months ago

jdkato commented 7 months ago

See https://github.com/errata-ai/vale/issues/688#issuecomment-1766080782 for a related comment.

XVilka commented 6 months ago

One way to do that is to work with Tree-Sitter grammars the way NeoVim does—it will allow you to analyze comments in every language, depending on the syntax or so-called "injections".

weaversam8 commented 5 months ago

+1 for tree-sitter- there are grammars for nearly every language and tree-sitter makes it easy to design your own.

jdkato commented 3 months ago

Many of Vale's supported programming languages now use Tree-sitter parsers.

I think an interesting way to make this "extensible" would be to expose the underlying queries, allowing users to decide exactly what parts of the file is linted.

feasgal commented 3 weeks ago

+1000 for this issue. Here's my use case.

Vale ignores indented blocks, fenced blocks, and code spans by default. (docs)

We use MkDocs to generate our documentation. One of the most common themes for MkDocs (and the one we use) is the Material theme, where line indentation can indicate many things other than a code block. Currently, vale ignores content in an admonition, a longer footnote, a definition list with more than one paragraph, a nested annotation, or content tabs, to name a few common examples.

We have a convention never to use indentation in the markdown to format a code block in the built HTML, but only use backtick fences. This matches Material's instructions, which don't even mention that indentation would work (presumably to avoid confusion with the many other ways it uses indentation). If code ignoring were not only configurable, but separately configurable for indented blocks, fenced blocks, and code spans, that would solve the problem completely.

But even if it were just possible to configure the whole code ignore as true or false, I could at least turn it off and then write a custom rule to ignore only the fences and/or code spans.