errata-ai / vale

:pencil: A markup-aware linter for prose built with speed and extensibility in mind.
https://vale.sh
MIT License
4.51k stars 155 forks source link

TokenIgnores not working as intended? #736

Closed kellertuer closed 9 months ago

kellertuer commented 11 months ago

Check for existing issues

Environment

vale 2.30 on Mac OS (via Homebrew)

Describe the bug / provide steps to reproduce it

First of all thanks for merging the Julia extension – it seems to work quite great already on my side.

While reworking my code style a bit already I noticed two things that I can not get to work properly. Consider the following example which is Julia code to “just attach the docstring” to a function signature – but one could also implement the function instead of the last line

@doc raw"""
    DouglasRachford(M, f, proxes_f, p; kwargs...)

a doc string with some math ``t_{k+1} = g(α_k; t_k, s_k)``
"""
DouglasRachford(M, f, proxes_f, p; kwargs...)

I have two problems with this that I can not resolve. 1) The first line (and actually any line starting with 4 spaces in such a doc string) is code and should be considered code. Currently my vale complains that rpoxes_f and kwargs are unnknown words (Vale.Spelling) and ... should not be used. 2) math in Julia doc strings is put in 2 backticks, so I would like to also ignore these. What vale toes is complain that t_k is an unknown word (Vale.Spelling).

So in my vale.ini I did the following (extending my existing ignores), documenting them a bit in a comment)

[*.{md,jl}]
BasedOnStyles = Vale, Google

; ignore (1) math (2) ref and cite keys (3) code in docs (4) math in docs (5,6) indented blocks
TokenIgnores = \
    \$.+?\$, \
    \[.+\]\(@(ref|id|cite).+?\), \
    `.+`, \
    ``.+``, \
    ^\s{4}.+$ \

where the second to last is meant to capture the math above, and the last meant to capture the code block line. But neither of them seems to have an effect. Am I doing the TokenIgnores wrong or is that a bug in vale?

kellertuer commented 11 months ago

Hm, It seems to trigger for the first few in comments (lines starting with # but neither in inline strings nor in the doc strings like here. It seems there is a main thing I do not understand about token ignores maybe?

So how can I apply tokenIgnores to the multiline strings as in the example above? Since Julia is not yet in the docs I am also not so sure what the doc string is classified as. Text?

jdkato commented 9 months ago

In order to make use of ignore patterns in jl files, you need to assign an embedded markup format:

[formats]
jl = md

However, since the syntax (4 spaces, backticks) is already valid Markdown, this should work without ignore patterns.

(You'll need to be using the latest version, v3.0.6, since this functionality was not part of the original Julia PR.)

kellertuer commented 9 months ago

Thanks I will try that, for now I always read a=b in formats as “format a is also of form b” (for example I use qmd=md since I also have quarto notebooks.

But sure your comment seems to do the trick, just that I seem to have missed the embedded part.

I will also add an entry about Julia somewhen then (when I got a first example to fully work).

jdkato commented 9 months ago

Thanks I will try that, for now I always read a=b in formats as “format a is also of form b” (for example I use qmd=md since I also have quarto notebooks.

This is indeed the intended meaning, but it only applies to the content that is linted.

In other words, jl = md means lint Julia comments as Markdown.

kellertuer commented 9 months ago

Hm, this is still quite strange, because it also removes all other things that were errors before. Concrete example: Before I had

> vale src/solvers/quasi_Newton.jl 

 src/solvers/quasi_Newton.jl
 8:6     error    Don't put a space before or     Google.EmDash          
                  after a dash.                                          
 8:7     error    Use an em dash ('—') instead    Google.EnDash          
                  of '–'.                                                
[... skipped the rest...]

(the file is this one https://github.com/JuliaManifolds/Manopt.jl/blob/master/src/solvers/quasi_Newton.jl)

If I set jl=md in the formats – this file has 0 errors left, but the doc string that still has wrong dashes in its content (and should now be handled as markdown) is still in there. So sure, then I have zero errors left, which is nice, but most of the errors I previously had (we, dash, i.e.,...) are also no longer found in the -jl files.

I also do not speak about comments but about docstrings.

kellertuer commented 9 months ago

Garth, I edited that partly so sorry the link links to a different version. Still my problem persists, that setting jl = md removes all ✖ 1706 errors, 714 warnings and 0 suggestions in 163 files. I have without setting that.

Maybe this is then still not the tool that fits my needs.

edit: Ah here is the correct version to the two warnings above https://github.com/JuliaManifolds/Manopt.jl/blob/kellertuer/vale-on-code/src/solvers/quasi_Newton.jl

where you can see those two are in the doc string. If I set jl=md I am not sure what is happening but the two warnings about line 8 (which I think should be present) vanish.

kellertuer commented 9 months ago

The opposite false-positive is

 src/helpers/checks.jl
 87:75    warning  In general, don't use an        Google.Ellipses 
                   ellipsis.  

on https://github.com/JuliaManifolds/Manopt.jl/blob/5a6815e9fd56735e2222ba13f11f5f62fe779b39/src/helpers/checks.jl#L87

which is valid Julia code and the ... are syntactically necessary in such code. With jl=md this should (and does) vanish, but all other warnings and errors in Julia files do as well (really all).

jdkato commented 9 months ago

Could you share your .vale.ini? When I run that file against the Google style, I get this:

❯ vale cases/test.jl                                                                                                                         

 cases/test.jl
 4:104   suggestion  In general, use active voice    Google.Passive         
                     instead of passive voice ('be                          
                     used').                                                
 8:6     error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 8:7     error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 9:6     error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 9:7     error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 10:7    error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 10:8    error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 13:22   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 13:23   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 14:9    error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 14:10   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 97:7    warning     Try to avoid using              Google.We              
                     first-person plural like 'we'.                         
 97:36   suggestion  Use parentheses judiciously.    Google.Parens          
 118:8   warning     'Stopping Criterion'            Google.Headings        
                     should use sentence-style                              
                     capitalization.                                        
 154:89  error       Use 'for example' instead of    Google.Latin           
                     'e.g.'.                                                
 155:30  suggestion  Use parentheses judiciously.    Google.Parens          
 160:6   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 160:7   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 161:6   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 161:7   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 162:1   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 162:11  error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 163:6   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 163:7   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 166:10  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 166:68  error       Don't use plurals in            Google.OptionalPlurals 
                     parentheses such as in                                 
                     'space(s)'.                                            
 167:11  suggestion  Use parentheses judiciously.    Google.Parens          
 168:20  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 168:32  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 168:33  suggestion  Use parentheses judiciously.    Google.Parens          
 170:22  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 170:45  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 170:79  suggestion  Use 'that's' instead of 'that   Google.Contractions    
                     is'.                                                   
 172:1   suggestion  Use parentheses judiciously.    Google.Parens          
 172:21  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 172:22  error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 173:15  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 174:15  suggestion  Use parentheses judiciously.    Google.Parens          
 174:85  error       Use 'that is' instead of        Google.Latin           
                     'i.e.'.                                                
 176:21  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 178:16  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 178:47  suggestion  Use parentheses judiciously.    Google.Parens          
 180:22  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 184:14  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 187:1   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 187:1   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 187:1   suggestion  Use parentheses judiciously.    Google.Parens          
 189:1   suggestion  Use parentheses judiciously.    Google.Parens          
 190:1   suggestion  Use parentheses judiciously.    Google.Parens          
 190:28  error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 190:29  error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 194:14  suggestion  Use parentheses judiciously.    Google.Parens          
 215:16  warning     Try to avoid using              Google.We              
                     first-person plural like                               
                     'our'.                                                 
 236:6   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 236:7   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 237:6   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 237:7   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 238:1   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 238:10  error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 239:6   error       Don't put a space before or     Google.EmDash          
                     after a dash.                                          
 239:7   error       Use an em dash ('—') instead    Google.EnDash          
                     of '–'.                                                
 368:1   suggestion  Use parentheses judiciously.    Google.Parens          
 393:3   suggestion  Spell out 'BFGS', if it's       Google.Acronyms        
                     unfamiliar to the audience.                            
 410:11  suggestion  Spell out 'DFP', if it's        Google.Acronyms        
                     unfamiliar to the audience.                            
 429:3   suggestion  Spell out 'DFP', if it's        Google.Acronyms        
                     unfamiliar to the audience.                            
 589:38  suggestion  Use 'doesn't' instead of 'does  Google.Contractions    
                     not'.                                                  
 623:28  suggestion  In general, use active voice    Google.Passive         
                     instead of passive voice ('is                          
                     fulfilled').                                           
 623:60  suggestion  In general, use active voice    Google.Passive         
                     instead of passive voice ('are                         
                     added').                                               
 627:11  warning     Use the Oxford comma in         Google.OxfordComma     
                     'the stored vectors are just                           
                     transported to the new tangent                         
                     space, sk and'.                                        
 627:87  suggestion  Use 'aren't' instead of 'are    Google.Contractions    
                     not'.                                                  

✖ 45 errors, 4 warnings and 21 suggestions in 1 file.
kellertuer commented 9 months ago

Thanks for checking :) and the fuser fast answer. It's all on the same branch. So I have one vale.ini for the while repo is https://github.com/JuliaManifolds/Manopt.jl/blob/kellertuer/vale-on-code/.vale.ini

Which should work fine for the docs/src/ folder where the markdown files I have are, it does not yet include your fix (that I only did locally without committing/pushing it) and both examples above return to 0 errors and 0 warnings if I set jl = md therein.

jdkato commented 9 months ago

I would remove your TokenIgnores and then evaluate the new results.

Code, indented blocks, and links (the ref and cite keys) are ignored by default in Markdown.

kellertuer commented 9 months ago

These have (in both cases with or without jl = md) exactly no effect, which is of course also good to know ;) but the above problem described still stays the same.

jdkato commented 9 months ago

These have (in both cases with or without jl = md) exactly no effect, which is of course also good to know ;)

This is not true. My advice is to remove your TokenIgnores.

kellertuer commented 9 months ago

I give up.

I did remove them. The effect is zero.

To give a concrete example (again): Take https://github.com/JuliaManifolds/Manopt.jl/blob/kellertuer/vale-on-code/src/plans/stopping_criterion.jl and the vale ini without any TokenIgnores. Or. for completeness (even further simplified)

StylesPath = docs/styles
MinAlertLevel = warning
Vocab = Manopt

Packages = Google

[formats]
qmd = md

[docs/src/*.md]
BasedOnStyles = Vale, Google

[src/*.jl]
BasedOnStyles = Vale, Google

Then running vale reports

❯ vale src/plans/stopping_criterion.jl 

 src/plans/stopping_criterion.jl
 4:67    error    Use 'that is' instead of        Google.Latin    
                  'i.e.'.                                         
 10:70   error    Did you really mean 'Bool'?     Vale.Spelling   
 21:74   error    Did you really mean             Vale.Spelling   
                  '_always_'?                                     
 26:1    error    Use 'that is' instead of        Google.Latin    
                  'i.e.'.                                         
 31:74   warning  Try to avoid using              Google.We       
                  first-person plural like 'we'.                  
 62:1    error    Use 'that is' instead of        Google.Latin    
                  'i.e.'.                                         
 122:43  error    Use 'that is' instead of        Google.Latin    
                  'i.e.'.                                         
 126:12  error    Don't put a space before or     Google.EmDash   
                  after a dash.                                   
 126:13  error    Use an em dash ('—') instead    Google.EnDash   
                  of '–'.                                         
 127:11  error    Don't put a space before or     Google.EmDash   
                  after a dash.                                   
 127:12  error    Use an em dash ('—') instead    Google.EnDash   
                  of '–'.                                         
 134:16  error    Did you really mean             Vale.Spelling   
                  'stopafterIteration'?                           
 531:10  error    Don't put a space before or     Google.EmDash   
                  after a dash.                                   
 531:11  error    Use an em dash ('—') instead    Google.EnDash   
                  of '–'.                                         
 532:4   error    Did you really mean             Vale.Spelling   
                  'minValue'?                                     
 532:13  error    Don't put a space before or     Google.EmDash   
                  after a dash.                                   
 532:14  error    Use an em dash ('—') instead    Google.EnDash   
                  of '–'.                                         
 533:11  error    Don't put a space before or     Google.EmDash   
                  after a dash.                                   
 533:12  error    Use an em dash ('—') instead    Google.EnDash   
                  of '–'.                                         
 540:16  error    Did you really mean             Vale.Spelling   
                  'stopifsmallerorequal'?                         
 582:7   error    Did you really mean '_all_'?    Vale.Spelling   
 587:38  warning  In general, don't use an        Google.Ellipses 
                  ellipsis.                                       
 654:7   error    Did you really mean '_any_'?    Vale.Spelling   
 659:37  warning  In general, don't use an        Google.Ellipses 
                  ellipsis.                                       
 727:56  error    Use 'that is' instead of        Google.Latin    
                  'i.e.'.                                         

✖ 22 errors, 3 warnings and 0 suggestions in 1 file.

the fist one is great, the second one as well,... the two warnings (near the end) are not, they are in code blocks (in a docstring/markdown). So that is Julia code within the markdown/docstring that is ok as it is and should not issue warnings.

If I add jl=md to formats above, this flle has zero errors, zero warnings. Sure that does sound nice, but is equivalent to not using vale at all.

jdkato commented 9 months ago

Please try this:

StylesPath = docs/styles
MinAlertLevel = warning

; NOTE: You need to be using the latest version of Vale (v3.0.6) in order
; to use `jl = md`.
;
; Uncommenting this line will give you an error in Vale versions >=3.0 until
; your vocab is moved, as described here:
;
; https://github.com/errata-ai/vale/releases/tag/v3.0.0
;
; Vocab = Manopt

Packages = Google

[formats]
; NOTE: After making a format assignment, the configuration of the 
; *assigned* format is now applied.^1
qmd = md
; TokenIgnores are only applied in supported *markup* formats (of which `jl` is not one).
; However, once assigned an embedded markup format (Markdown, in this case), the 
; TokenIgnores *will* be applied.
jl = md

[docs/src/*.md]
BasedOnStyles = Vale, Google

; ^1: We're now referencing the `md` format, which we assigned above.
[src/*.md]
BasedOnStyles = Vale, Google

; $ vale src/plans/stopping_criterion.jl
; ✖ 18 errors, 1 warning and 0 suggestions in 1 file.

The first NOTE is particularly important as, based on your repo structure, it appears you're using Vale v2.0.

kellertuer commented 9 months ago

Yes I was not aware of still being on an older version, since I just installed it with Homebrew.

How do you recognise the old version on “folder structure”?

kellertuer commented 9 months ago

Oh and one further question – there is no .md in the src/ folder, but as you write this still then also applies to .jl files? That is a bit confusing configuration wise.

kellertuer commented 9 months ago

Sorry for a bit of spam – this did indeed work, but I “lost” my vocabulary and I am not sure where to put that to get it back?