eXist-db / exist-markdown

Markdown Parser in XQuery
Other
13 stars 9 forks source link

Code blocks properly identified #14

Open lguariento opened 2 years ago

lguariento commented 2 years ago

This edit restores the correct identification and parsing of code blocks (and in general it restores at least the parsing of the article rather than returning an ugly cardinality error).

lguariento commented 2 years ago

Now that at least we have restored some rendering of the articles we should work on that REGEX / analyze-string, since it is not catching all the code blocks. For instance the following is rendered with paragraphs (in /wiki/blogs/eXist/XQuery31).

'''xquery
xquery version "3.1";

import module namespace http="http://expath.org/ns/http-client";

declare function local:log($json as array(\*)) {
    \<table>
    {
        for $entry in $json?\*
        let $commit := $entry?commit
        return
            \<tr>
                \<td>{$commit?committer?date}\</td>
                \<td>{$commit?committer?name}\</td>
                \<td>{$commit?message}\</td>
            \</tr>
    }
    \</table>
};

let $url := 
"https://api.github.com/repos/eXist-db/exist/commits?since=2015-01-01T00:00:00Z"
let $request := <http:request method="GET" href="{$url}" timeout="30"/>
let $response := http:send-request($request)
return
    if ($response[1]/@status = "200") then
        local:log(parse-json(util:binary-to-string($response[2])))
    else
        ()
'''
lguariento commented 2 years ago

I am doing minimal testing in eXide, and it looks like this does what we want it to do:

xquery version "3.1";

let $block := '```xquery
xquery version "3.1";

declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";

let $array := map { "k1": array { "v1", "v2" }, "k2": "v3" }
return
    serialize($array, 
        <output:serialization-parameters>
            \<output:method>json\</output:method>
        </output:serialization-parameters>)
        other
        lines
        here
```'

let $code-test:= 
 if (matches($block, "^`{3}([\S]+)?\n([\s\S]+)\n`{3}$", "ms")) then
        let $tokens := analyze-string($block, "^`{3}([\S]+)?\n([\s\S]+)\n`{3}$", "ms")
        let $lang := $tokens//fn:group[1]
        let $code := $tokens//fn:group[2]

return 
    (<pre data-language="{$lang}">{$code}</pre>)

    else ()

    return $code-test

So, the issue is not on the regex for code-block (although the above should be a better version), but rather the function markdown:inline-code, which catches the blocks before the function function markdown:code.

joewiz commented 2 years ago

@lguariento Thank you! To aid in my review, could you point me to an article where the change fixes the rendering of the markdown?

lguariento commented 2 years ago

@joewiz XQuery31 or travisci. See also here, the screenshot. In XQuery 31, for instance, some code blocks are properly rendered, others are not. See also my comments above.

joewiz commented 2 years ago

Ok, I can see that the XQuery31.md no longer throws an error for this article, which is great. But code blocks are wrapped as paragraphs, so this also introduces a regression. But it's not your fault! The code is exceedingly brittle, since we lack tests to check that any change doesn't introduce unwanted side effects. The closest thing to a test suite that we have is the page at http://localhost:8080/exist/apps/markdown/test.md. The code block issue is visible there:

Screen Shot 2022-01-31 at 10 41 27 AM

The approach I would take would recommend to get this library on solid ground is:

  1. Generate a new structure for this package using, say, generator-exist with the library template. The resulting structure makes running tests locally a breeze and adds CI-based testing (steps are included in the autogenerated README). This new structure will be the basis for future package development. Open a PR, and we will see CI runs and green check marks.
  2. Open the auto-generated XQSuite test module; take the test.md file and pull each entry into a new XQSuite test function, turn the Markdown into the test body; and use the test annotations to check that markdown:parse returns the expected HTML output. (This could also be a standalone XQSuite module; the test functions are the important thing though.) Mark any failing tests as %test:pending%.
  3. Tackle the pending tests, one by one.
  4. Run the updated markdown module against existing markdown in the AtomicWiki - either adding tests for failing segments or adjusting the source articles to conform to AtomicWiki's supported Markdown syntax.

If you're up for it, I'd recommend taking this approach and then returning to this PR at stage 4.

I'd welcome @duncdrum's suggestions on this approach.

An alternative would be to find resources to create an XQuery wrapper around one of the commonmark libraries. CommonMark already has 600-odd tests and a rigorous spec. I discuss this idea in #13. As my experiments last night showed, if we take that route, some of our "Markdown" articles (including XQuery31.md) will need to be adjusted, since they don't conform to CommonMark in a few respects. But adjusting some Markdown files is quite a different task than creating a full blown test suite around an implementation that predated the development of the CommonMark spec. The biggest challenge there is Java - finding the resources to create the wrapper, etc.

lguariento commented 2 years ago

Only some of the code blocks are wrapped in <p>. Some are correctly identified (e.g. the first ones, see screenshot), others (a lot of them) are identified as inline code. Don't you get the same result?

duncdrum commented 2 years ago

@joewiz i think generating a lib via yeoman using the existing test.md sounds like a really good way forward. In addition we could add linting to the atomic-wiki so that the markdown adheres to common mark, or whichever flavour, lessening the burden on the custom parser here.

lguariento commented 2 years ago

Great, let me know specific steps if you need anything from me since I'm a bit out of my depth regarding yeoman / tests etc, and I'll see what I can do.

joewiz commented 2 years ago

Hi @lguariento, I've opened a draft PR containing the start of the changes I described:

  1. I followed the directions in the README of generator-exist to install generator-exist locally and generate a blank library package for exist-markdown into a new directory, exist-markdown-temp. Then I created a new branch of my clone of exist-markdown and copied the generated assets from the temporary folder into the branch. Before committing the changes to my branch, I used my Git client's diff tool to confirm the changes, and I confirmed that npm test passed (as described in the autogenerated README).
  2. I began adapting the implicit tests in the original repository's test.md file into explicit XQSuite tests. The first three tests I created - for paragraphs and inline code - are already passing locally and in CI.

The remaining steps are to complete adapting the test.md file into XQSuite tests, and work through any failures that come up.