errata-ai / vale

:pencil: A markup-aware linter for prose built with speed and extensibility in mind.
https://vale.sh
MIT License
4.36k stars 144 forks source link

Unwanted warnings raised with version 2.29.4 #706

Closed SMoraisAnsys closed 10 months ago

SMoraisAnsys commented 10 months ago

Check for existing issues

Environment

I'm using value through the github action errata-ai/vale-action@reviewdog but I think it is appropriate to open an issue here as my problem arises when a change the vale version.

Here is my action setting:

  doc-style:
    name: "Check documentation style"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Running Vale
        uses: errata-ai/vale-action@reviewdog
        env:
          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
        with:
          files: doc
          reporter: github-pr-check
          level: error
          filter_mode: nofilter
          fail_on_error: true
          vale_flags: "--config=doc/.vale.ini"

and my .vale.ini contains

StylesPath = "styles"
MinAlertLevel = warning
IgnoredScopes = code, tt
SkippedScopes = script, style, pre, figure
WordTemplate = \b(?:%s)\b
Packages = Google
Vocab = perso # This accepts PEP 8 (cf my rst file context)

[*.{rst}]

BasedOnStyles = Vale, Google
TokenIgnores = (:.*:`.*`)

Google.WordList = NO
Google.Colons = NO

Describe the bug / provide steps to reproduce it

The error I'm facing is that when vale works on my .rst file, the google rule regarding Heading is triggered when it shouldn't.

Here is my rst file

PEP 8
=====
blabla

Imports
-------
blabla

Import location
~~~~~~~~~~~~~~~
blabla

and the associated errors

 {"message": "[Google.Headings] 'PEP 8' should use sentence-style capitalization.", ..., "severity": "WARNING"}
 {"message": "[Google.Headings] 'Imports' should use sentence-style capitalization.", ... , "severity": "WARNING"}
 {"message": "[Google.Headings] 'Import location' should use sentence-style capitalization.", ... , "severity": "WARNING"}

While I though that this error might be related to https://github.com/errata-ai/vale/discussions/702, using Google.Headings = NO didn't solve my issue. However, this warnings are no longer raised when I constraint the vale github action to use the version "2.29.3" (latest is "2.29.4").

pwalleni commented 10 months ago

Same problem with running locally.

jdkato commented 10 months ago

Should be fixed now.

SMoraisAnsys commented 10 months ago

I've updated our action to use version 2.29.5 however part of the error is still here :

  {"message": "[Google.Headings] 'PEP 8' should use sentence-style capitalization.", ... , "severity": "WARNING"}

Note: errors associated to

Imports
-------
blabla

Import location
~~~~~~~~~~~~~~~
blabla

are no longer present πŸ‘

SMoraisAnsys commented 10 months ago

@jdkato Want me to fill another issue (related to version 2.29.5) or could you open this one back ?

jdkato commented 10 months ago

"PEP 8" needs to be added as an exception. Is it in your perso vocab?

SMoraisAnsys commented 10 months ago

@jdkato Thanks for your answer. "PEP 8" was not part of my vocab and adding it solves the reproducer's issue. However, the problem still exist for cases that are not vocab related (or I would surprised). For example:

Configure code coverage
=======================
Some content

Imports
-------
Some content

Import location
~~~~~~~~~~~~~~~
Some content

triggers the following error :

Raw Output:
{"message": "[Google.Headings] 'Configure code coverage' should use sentence-style capitalization.", ..., "severity": "WARNING"}
jdkato commented 10 months ago

If you update to the latest version of the Google style (v0.4.2) either by running vale sync (locally) or re-running the action workflow, this should be resolved too.

SMoraisAnsys commented 10 months ago

Seems to be working fine πŸ‘ Thanks !

pwalleni commented 10 months ago

I pulled down https://github.com/errata-ai/Google/releases/tag/v0.4.2 but vale still triggers on the Headings rule when I run it locally. The offending words are in the vocabulary file, for example, this one:

Vale version:

❯ vale --version
vale version 2.29.5

Running vale locally:

❯ vale docs/technical-writing-101/planning-a-techdocs-project/audience-and-purpose.md

 docs/technical-writing-101/planning-a-techdocs-project/audience-and-purpose.md
 29:4  warning  'TechDocs project examples'     Volvo-Cars.Headings
                should use sentence-style
                capitalization.

Vale local config:

❯ vale ls-config
{
  "BlockIgnores": {},
  "Checks": null,
  "Formats": {},
  "Asciidoctor": {},
  "FormatToLang": {},
  "GBaseStyles": null,
  "GChecks": {},
  "IgnoredClasses": null,
  "IgnoredScopes": null,
  "MinAlertLevel": 0,
  "Vocab": [
    "Volvo-Cars"
  ],
  "RuleToLevel": {},
  "SBaseStyles": {
    "*.md": [
      "Volvo-Cars"
    ]
  },
  "SChecks": {
    "*.md": {}
  },
  "SkippedScopes": null,
  "Stylesheets": {},
  "StylesPath": "/Users/pascalw/src/work/volvo-cars/techdocs-user-guide/.vale/volvo-cars-style-guide/styles",
  "TokenIgnores": {},
  "WordTemplate": "",
  "RootINI": "/Users/pascalw/src/work/volvo-cars/techdocs-user-guide/.vale.ini",
  "DictionaryPath": "",
  "NLPEndpoint": ""
}

Offending word in vocabulary file:

❯ grep TechDocs .vale/volvo-cars-style-guide/styles/Vocab/Volvo-Cars/accept.txt
TechDocs

Google style rule triggering:

❯ cat .vale/volvo-cars-style-guide/styles/Volvo-Cars/Headings.yml
extends: capitalization
message: "'%s' should use sentence-style capitalization."
link: "https://developers.google.com/style/capitalization#capitalization-in-titles-and-headings"
level: warning
scope: heading
match: $sentence
indicators:
  - ":"
exceptions:
  - Azure
  - CLI
  - Cosmos
  - Docker
  - Emmet
  - gRPC
  - I
  - Kubernetes
  - Linux
  - macOS
  - Marketplace
  - MongoDB
  - REPL
  - Studio
  - TypeScript
  - URLs
  - Visual
  - VS
  - Windows

@jdkato, has the behavior changed, so I must add all the words in the vocabulary file accept.txt as exceptions in the list of words in the Headings rule definition file?

jdkato commented 10 months ago

has the behavior changed, so I must add all the words in the vocabulary file accept.txt as exceptions in the list of words in the Headings rule definition file?

No, you don't need to edit the rule source.

Could you share your accept.txt? Do you have multiple case variations of "TechDocs"?

pwalleni commented 10 months ago

Could you share your accept.txt? Do you have multiple case variations of "TechDocs"?

Hi @jdkato. Thanks for the support. Yes, there are techdocs and TechDocs: https://gist.github.com/pwalleni/c56a849bfb736e1f32bb4104566681a4

SMoraisAnsys commented 10 months ago

Could you share your accept.txt? Do you have multiple case variations of "TechDocs"?

Hi @jdkato. Thanks for the support. Yes, there are techdocs and TechDocs: https://gist.github.com/pwalleni/c56a849bfb736e1f32bb4104566681a4

I'm also interested in any fixes as I still have issues on my project (from which I extracted the reproducer)

pwalleni commented 10 months ago

I made a test with an empty accept.txt vocabulary file and put vale on a run loop

while true; do vale docs; sleep 10; done

As I added words to accept.txt the issues reported by vale got fewer and fewer until only the Headings rule issues kept appearing despite the word in the matching case existing in accept.txt.

pwalleni commented 10 months ago

Running vale on the same Markdown file every 3 seconds produces different results. Very odd behaviour:

❯ while true; do vale docs/technical-writing-101/documentation-topic-types/runbook.md; date; sleep 3; done
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:27:49 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:27:53 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:27:56 CEST 2023

 docs/technical-writing-101/documentation-topic-types/runbook.md
 8:3  warning  'Runbook information type'      Volvo-Cars.Headings
               should use sentence-style
               capitalization.

βœ– 0 errors, 1 warning and 0 suggestions in 1 file.
Thu Oct 19 14:27:59 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:28:03 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:28:06 CEST 2023

 docs/technical-writing-101/documentation-topic-types/runbook.md
 8:3  warning  'Runbook information type'      Volvo-Cars.Headings
               should use sentence-style
               capitalization.

βœ– 0 errors, 1 warning and 0 suggestions in 1 file.
Thu Oct 19 14:28:09 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:28:12 CEST 2023

 docs/technical-writing-101/documentation-topic-types/runbook.md
 8:3  warning  'Runbook information type'      Volvo-Cars.Headings
               should use sentence-style
               capitalization.

βœ– 0 errors, 1 warning and 0 suggestions in 1 file.
Thu Oct 19 14:28:16 CEST 2023

 docs/technical-writing-101/documentation-topic-types/runbook.md
 8:3  warning  'Runbook information type'      Volvo-Cars.Headings
               should use sentence-style
               capitalization.

βœ– 0 errors, 1 warning and 0 suggestions in 1 file.
Thu Oct 19 14:28:19 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:28:22 CEST 2023
βœ” 0 errors, 0 warnings and 0 suggestions in 1 file.
Thu Oct 19 14:28:25 CEST 2023

 docs/technical-writing-101/documentation-topic-types/runbook.md
 8:3  warning  'Runbook information type'      Volvo-Cars.Headings
               should use sentence-style
               capitalization.

βœ– 0 errors, 1 warning and 0 suggestions in 1 file.
Thu Oct 19 14:28:29 CEST 2023
^C%

The offending word Runbook is in accept.txt:

❯ cat .vale/volvo-cars-style-guide/styles/Vocab/Volvo-Cars/accept.txt
Artifactory
Backstage
CODEOWNERS
DevEx
Diataxis
GitHub
Kroki
Markdown
MkDocs
PDF
PlantUML
Reviewdog
Runbook
TechDocs
Vale
Visual
Studio
Code
Volvo Cars
discoverability
reviewdog
runbook
Runbook
SharePoint
Spotify
strikethrough
Strikethrough
jdkato commented 10 months ago

The underlying data structure for accept.txt is not sorted, so having multiple (differently-cased) entries for the same term can result in nondeterministic behavior (as you saw above).

Your accept.txt can be re-written as:

[Rr]eviewdog
[Rr]unbook
[Ss]trikethrough
Artifactory
Backstage
Code
CODEOWNERS
DevEx
Diataxis
discoverability
GitHub
Kroki
Markdown
MkDocs
PDF
PlantUML
SharePoint
Spotify
Studio
TechDocs
Vale
Visual
Volvo Cars

You should generally have one entry per term.