facelessuser / pyspelling

Spell checker automation tool
https://facelessuser.github.io/pyspelling/
MIT License
80 stars 21 forks source link

Not ignoring code/pre blocks #156

Closed itsjoekent closed 1 year ago

itsjoekent commented 1 year ago

Having trouble getting this action to ignore code blocks in my markdown.

Here is my configuration,

matrix:
- name: Markdown
  aspell:
    lang: en
    ignore-case: true
  dictionary:
    wordlists:
    - .github/wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Here is some sample markdown it's reporting spell check errors for,

## Research Process

To determine if unnecessary DOM mutations were happening, we attached a Mutation Observer to the application with the following code,

```js
function noisyCallback(mutationList) {
  mutationList.forEach((mutation) => {
      console.log("mutation observed", mutation)
  });
}

const el = document.querySelector('#app')

const observer = new MutationObserver(noisyCallback)

observer.observe(el, {characterData: true, characterDataOldValue: true, subtree: true, childList: true, attributes: true, attributeOldValue: true})

**In order to not break the formatting of this Github Issue, I removed a backtick from the start & end of the JS codeblock**. The actual markdown file in question has 3 backticks.

This is the error I am getting,

Using pyspelling on configuration outlined in >.github/spellcheck-config.yml< Checking files matching specified outlined in >.github/spellcheck-config.yml<

Misspelled words:

rd/localize-react.md: html>body>p -------------------------------------------------------------------------------- forEach mutationList noisyCallback -------------------------------------------------------------------------------- Misspelled words: rd/localize-react.md: html>body>p -------------------------------------------------------------------------------- const el querySelector -------------------------------------------------------------------------------- Misspelled words: rd/localize-react.md: html>body>p -------------------------------------------------------------------------------- MutationObserver const noisyCallback -------------------------------------------------------------------------------- Misspelled words: rd/localize-react.md: html>body>p -------------------------------------------------------------------------------- attributeOldValue characterData characterDataOldValue childList el subtree -------------------------------------------------------------------------------- ``` As you can see, all of the code is being treated as a paragraph element, not a code or pre element. In the Github render, it is a pre element and renders as a code block: Screen Shot 2022-10-13 at 11 44 33 AM Can someone help me understand why the spell check isn't working?
facelessuser commented 1 year ago

In order to not break the formatting of this Github Issue, I removed a backtick from the start & end of the JS codeblock.

Just use a larger fence wrapping your content:

Some paragraph.

content

As you can see, all of the code is being treated as a paragraph element, not a code or pre element. In the Github render, it is a pre element and renders as a code block:

Onto your issue. The Markdown parser used in the default Markdown plugin is Python Markdown. Python Markdown does not recognize fenced code by default. You must enable a fenced code extension.

I assume you may be referencing what we do in our project's default config, or maybe I need to fix something in the documentation, but here is an example of a project that does parse some Markdown with code blocks: https://github.com/facelessuser/coloraide/blob/main/.pyspelling.yml#L38. You'll notice we use pymdownx.superfences extension to process code blocks.

itsjoekent commented 1 year ago

Thank you for the tip! And the extension worked, spell check is passing now!