carpentries / styles

Styles for The Carpentries lessons. No README to avoid merge conflicts with lessons. Demo 👇
https://carpentries.github.io/lesson-example
Other
85 stars 96 forks source link

lesson_check.py parser crash on edge syntax case #543

Closed unode closed 3 years ago

unode commented 3 years ago

Given a lesson parsetest.md:

---
title: "Parse test"
teaching: 0
exercises: 0
questions:
- "FIXME"
objectives:
- "FIXME"
keypoints:
- "FIXME"
---

> ## Title
> > ## SubTitle
> >
> > Missing empty line below crashes parser
> > ~~~
> > <h2>Beware</h2>
> > ~~~
> > {: .source}
> >
> > ~~~
> > Dragons
> > ~~~
> > {: .source }
> {: .solution}
{: .challenge}

running make lesson-check-all results in:

Traceback (most recent call last):
  File "bin/lesson_check.py", line 566, in <module>
    main()
  File "bin/lesson_check.py", line 123, in main
    checker.check()
  File "bin/lesson_check.py", line 342, in check
    self.check_codeblock_classes()
  File "bin/lesson_check.py", line 393, in check_codeblock_classes
    self.reporter.check(cls in KNOWN_CODEBLOCKS or cls.startswith('language-'),
AttributeError: 'NoneType' object has no attribute 'startswith'
make: *** [Makefile:139: lesson-check-all] Error 1

Modifying the above to read:

(...)
> > Missing empty line below crashes parser
> >
> > ~~~
(...)

avoids the crash.

The expectation is that the lesson check script can handle incorrect input without crashing, ideally providing information on what could be wrong.

zkamvar commented 3 years ago

For future reference, the parser is treating that codeblock as this:

{'type': 'codeblock', 'value': '{: .source}\n\n', 'options': {'location': 9, 'fenced': True}}

One of the solutions would be to add another check right before the code block to check that the class is not none and then throw a check error

tomgreen66 commented 3 years ago

Hi, also hit this issue with the following piece of code. Any advice how to not trigger the error reported above:

def helpMessage() { log.info """ Usage: The typical command for running the pipeline is as follows: nextflow run main.nf --query \$PWD/books/isles.txt --outfile isles.dat

    Mandatory arguments:
     --query                       Query file count words
     --outfile                     Output file name
     --outdir                      Final output directory

    Optional arguments:
     --app                         Python executable
     --wordcount                   Python code to execute
     --help                        This usage statement.
    """

}

// Show help message if (params.help) { helpMessage() exit 0 }

{: .source}
zkamvar commented 3 years ago

Hi @tomgreen66, could you provide a little more context before and after that code chunk? As @unode showed, the context before the code chunk affected how it was interpreted (e.g. you needed to have a blank line before the code block for markdown to interpret it correctly). Unfortunately kramdown's documentation is not of much help (it doesn't seem to mention context for code blocks: https://kramdown.gettalong.org/syntax.html#fenced-code-blocks).

tomgreen66 commented 3 years ago

There are blank lines before and after the codeblock. If I delete the blank line between the } and // Show help message it seems to work as expected but confused why it needs it.

zkamvar commented 3 years ago

There are blank lines before and after the codeblock. If I delete the blank line between the } and // Show help message it seems to work as expected but confused why it needs it.

Would you be able to give a link to the episode where this code block lives? I acknowledge that it is indeed weird and it should not be throwing that particular error. Having the full content episode would really help in understanding what behavior triggers this problem and how it can be fixed. That would allow me to go through the AST and see exactly where the kramdown parser messed up (my concept map for how kramdown parses its flavor of markdown is sparsely connected, I'm afraid).

tomgreen66 commented 3 years ago

Putting together some tutorials for online sessions using the Software Carpentry Style since we have a Carpenter in our team. See: https://arcca.github.io/intro_nextflow/02-first_script/index.html

zkamvar commented 3 years ago

Thank you for the link. For someone who knows the kramdown parser better than I, The value of the node that's giving the error is:

{'type': 'codeblock', 'value': ' Mandatory arguments:\n --query Query file count words\n --outfile Output file name\n --outdir Final output directory\n\n Optional arguments:\n --app Python executable\n --wordcount Python code to execute\n --help This usage statement.\n """ }\n', 'options': {'location': 261}}

Of course, it is correctly rendered on the Jekyll site.

There are a couple of things to note about this:

Context

Below is the context starting on line 261:


## Adding help functionality

Now that there is a series of arguments to the workflow it is beneficial to be able to print a help message.

In `main.nf` before the process definition the following can be added:

def helpMessage() { log.info """ Usage: The typical command for running the pipeline is as follows: nextflow run main.nf --query \$PWD/books/isles.txt --outfile isles.dat Mandatory arguments: --query Query file count words --outfile Output file name --outdir Final output directory Optional arguments: --app Python executable --wordcount Python code to execute --help This usage statement. """ }

// Show help message if (params.help) { helpMessage() exit 0 }

{: .source}

This is what the failing node contains:

Mandatory arguments:
 --query                       Query file count words
 --outfile                     Output file name
 --outdir                      Final output directory
Optional arguments:
 --app                         Python executable
 --wordcount                   Python code to execute
 --help                        This usage statement.
"""

Note that it has removed an indentation level of the code, which seems like it's interpreting a code block within a code block.

Context of the full markdown snippet: ````json {'type': 'blank', 'value': '\n'}, {'type': 'header', 'options': {'level': 2, 'raw_text': 'Adding help functionality', 'location': 248}, 'children': [{'type': 'text', 'value': 'Adding help functionality', 'options': {'location': 248}}]}, {'type': 'blank', 'value': '\n'}, {'type': 'p', 'options': {'location': 250}, 'children': [{'type': 'text', 'value': 'Now that there is a series of arguments to the workflow it is beneficial to be able to print a help message.', 'options': {'location': 250}}]}, {'type': 'blank', 'value': '\n'}, {'type': 'p', 'options': {'location': 252}, 'children': [{'type': 'text', 'value': 'In ', 'options': {'location': 252}}, {'type': 'codespan', 'value': 'main.nf', 'options': {'location': 252}}, {'type': 'text', 'value': ' before the process definition the following can be added:', 'options': {'location': 252}}]}, {'type': 'blank', 'value': '\n'}, {'type': 'p', 'options': {'location': 254}, 'children': [{'type': 'text', 'value': '```\ndef helpMessage() {\n log.info ', 'options': {'location': 254}}, {'type': 'smart_quote', 'value': 'ldquo', 'options': {'location': 256}}, {'type': 'smart_quote', 'value': 'rdquo', 'options': {'location': 256}}, {'type': 'smart_quote', 'value': 'rdquo', 'options': {'location': 256}}, {'type': 'text', 'value': '\n Usage:\n The typical command for running the pipeline is as follows:\n nextflow run main.nf ', 'options': {'location': 256}}, {'type': 'typographic_sym', 'value': 'ndash', 'options': {'location': 259}}, {'type': 'text', 'value': 'query $PWD/books/isles.txt ', 'options': {'location': 259}}, {'type': 'typographic_sym', 'value': 'ndash', 'options': {'location': 259}}, {'type': 'text', 'value': 'outfile isles.dat', 'options': {'location': 259}}]}, {'type': 'blank', 'value': '\n'}, {'type': 'codeblock', 'value': ' Mandatory arguments:\n --query Query file count words\n --outfile Output file name\n --outdir Final output directory\n\n Optional arguments:\n --app Python executable\n --wordcount Python code to execute\n --help This usage statement.\n """ }\n', 'options': {'location': 261}}, {'type': 'blank', 'value': '\n'}, {'type': 'p', 'attr': {'class': 'source'}, 'options': {'location': 273, 'ial': {'class': 'source'}}, 'children': [{'type': 'text', 'value': '// Show help message\nif (params.help) {\n helpMessage()\n exit 0\n}\n```', 'options': {'location': 273}}]} ````

Again, this is pushing against my knowledge of kramdown, and I'm not sure what's causing this code block to be interpreted incorrectly by the kramdown parser but render well on the site itself.

maxim-belkin commented 3 years ago

I'll have a look at this.

maxim-belkin commented 3 years ago

@unode and @tomgreen66, could you please test if changes to bin/markdown_ast.rb in #546 fix the issues that you're experiencing?

tomgreen66 commented 3 years ago

Thanks. Tried the fix in #546 by installing the gem and adding the change the markdown_ast.rb but when running make lesson-check-all I now get the error below for both the original code and the workaround "If I delete the blank line between the } and // Show help message". Think I added the fix correctly but happy for someone else to check. Maybe I need to reformat things differently - just assumed code block should be flexible.

$ make lesson-check-all
Traceback (most recent call last):
  File "intro_nextflow/bin/lesson_check.py", line 566, in <module>
    main()
  File "/intro_nextflow/bin/lesson_check.py", line 123, in main
    checker.check()
  File "/intro_nextflow/bin/lesson_check.py", line 342, in check
    self.check_codeblock_classes()
  File "/intro_nextflow/bin/lesson_check.py", line 393, in check_codeblock_classes
    self.reporter.check(cls in KNOWN_CODEBLOCKS or cls.startswith('language-'),
AttributeError: 'NoneType' object has no attribute 'startswith'
make: *** [lesson-check-all] Error 1
maxim-belkin commented 3 years ago

@tomgreen66, your code block passes tests for me, so let's debug this together. Are you on a Mac or a Linux-machine? Could you please save the following code into a file test.rb, execute it via ruby test.rb and report the result?

require "kramdown"
require "kramdown-parser-gfm"
require "json"

markdown = <<EOS
## Adding help functionality

Now that there is a series of arguments to the workflow it is beneficial to be able to print a help message.

In `main.nf` before the process definition the following can be added:

def helpMessage() { log.info """ Usage: The typical command for running the pipeline is as follows: nextflow run main.nf --query \$PWD/books/isles.txt --outfile isles.dat Mandatory arguments: --query Query file count words --outfile Output file name --outdir Final output directory Optional arguments: --app Python executable --wordcount Python code to execute --help This usage statement. """ }

// Show help message if (params.help) { helpMessage() exit 0 }

{: .source}
EOS
doc = Kramdown::Document.new(markdown, input: 'GFM', hard_wrap: false)
tree = doc.to_hash_a_s_t
puts tree[:children][-1][:value]
tomgreen66 commented 3 years ago

I get the following on MacOS Mojave using Homebrew Ruby:

$ ruby --version
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin18]
$ ruby test.rb
def helpMessage() {
  log.info """
       Usage:
        The typical command for running the pipeline is as follows:
        nextflow run main.nf --query $PWD/books/isles.txt --outfile isles.dat
        Mandatory arguments:
         --query                       Query file count words
         --outfile                     Output file name
         --outdir                      Final output directory
        Optional arguments:
         --app                         Python executable
         --wordcount                   Python code to execute
         --help                        This usage statement.
        """
}

// Show help message
if (params.help) {
    helpMessage()
    exit 0
}
tomgreen66 commented 3 years ago

Okay. My mistake. Having printed out line number and file it highlighting missing {: .source} at tend of code blocks in other files. So I think we need a:

            if cls is None:
                print(self.filename, self.get_loc(node))

To help locate places where class specifiers are not used. Sorry for confusion - I thought the rest of the document had no other issues - maybe this new parser picks them up too?

maxim-belkin commented 3 years ago

So I think we need a:

            if cls is None:
                print(self.filename, self.get_loc(node))

That's a good idea with one comment: we need to make sure it's clear that this is an error message and to either halt the execution of lesson-check or skip to the next file when this happens because we can't produce an AST of (and, therefore, analyze) the current file. Would you like to submit a PR with this fix?

unode commented 3 years ago

Sorry, I didn't manage to get back to you before. The issue is now closed but I confirm that #546 addresses the original failure.

maxim-belkin commented 3 years ago

Thanks for confirming, @unode.

zkamvar commented 3 years ago

At the moment, it's crashing on the current version of r-novice-gapminder, but I have no clue how to begin debugging this (other than print statements everywhere): https://github.com/swcarpentry/r-novice-gapminder/pull/696#issuecomment-796265728

tomgreen66 commented 3 years ago

I put a suggestion in #550 as described in this change https://github.com/ARCCA/styles/commit/661de87e4a8ce6c962180420c11922f07e476384 - maybe it will help.