Only operate on the text portion of latex files

TravisTheTechie / vscode-write-good

Write Good Linter for Visual Studio Code

MIT License

58 stars 7 forks source link

Only operate on the text portion of latex files #14

Open alberto-santini opened 4 years ago

alberto-santini commented 4 years ago

Hi! Thanks for your work creating this extension. I was wondering if there is a way to tell the linter to only work on the text portion of latex files, i.e., to ignore commands. I think it's best explained with an example: the linter flags \begin{enumerate} because it doesn't know that that part of the file is a latex command (and it doesn't like the verb "to enumerate"). Same with {\Huge Some large text here}, because "huge" is a weasel word.

I am not sure it's possible to implement this: do VSCode or Latex Workshop know which parts of a file are commands and which is text? I'll put it in the whishlist just in case. Thanks!

TravisTheTechie commented 4 years ago

I don't inherently know what content is code and what is text when it comes into the add-on. I have a couple of ideas I can play with, I might be to interact with the language subsystem to get some of that. I will poke at the options here, I agree it would be good to focus this on comments & content and not code if possible.

You can always whitelist particular strings (e.g. \huge, \begin{enumerate}) via the whitelist option in write-good. I haven't tested this, but it would be something like write-good.write-good-config: { whitelist: [ '\huge', '\begin{enumerate}' ] } in the config, per write-good's README https://www.npmjs.com/package/write-good.

alberto-santini commented 4 years ago

Hi! I think a scalable solution might be to ignore all tokens which belong to the keyword scope. However, I am not sure (a) how one would do that, (b) if it's expensive to check the scope of each token, (c) if it's efficient. A solution starting on write-good's flagged words list (i.e., whitelisting known keywords which collide with flagged words) sounds more efficient although less scalable, in the sense that one needs to maintain a list of whitelists for each language.

TravisTheTechie commented 4 years ago

So I've poked at this a bit, and tokenization/syntax interface is not the same interface I interact with. When I get an event to do linting, I just get the content of the document and none of the structure. I have not yet seen a way to get a view of a document that includes token information. E.g. I get a TextDocumentChangeEvent, containing a TextDocument which really just let's me get at the text itself. I think there would need to be changes to VS Code's structure itself for this to work on non-code text.