JuliaAttic / Markdown.jl

Markdown parsing for Julia
Other
40 stars 11 forks source link

CommonMark? #7

Open stevengj opened 10 years ago

stevengj commented 10 years ago

See this article. It would be interesting to see how the Markdown.jl parser etc. compares (in both performance and behavior) to the C99 CommonMark reference implementation.

hayd commented 9 years ago

https://github.com/jgm/CommonMark#running-tests-against-the-spec

To run the tests using an executable $PROG:

python3 test/spec_tests.py --program $PROG
hayd commented 9 years ago

I skimmed through the first 25% or so of failing tests with:

# Note: python3 only
tests = JSON.parse(readall(`python test/spec_tests.py --dump-tests`))

function correct(test)
    try
        return Markdown.html(Markdown.parse(test["markdown"])) == test["html"]
    catch
        println("error in $(test["example"]): $(test["section"])")
        return false
    end
end

failing = filter(x -> !correct(x), tests)

# and to quickly look at results from a failing test
check(n) = (println(repr(failing[n]["markdown"])); println(repr(failing[n]["html"])); println(repr(Markdown.html(Markdown.parse(failing[n]["markdown"])))))

So far I've found the following: cc @one-more-minute

MikeInnes commented 9 years ago

Thanks for taking a look at that, that's a good list to have. What's quite nice is that (perhaps surprisingly) there are very few particularly major things missing.

The main exception is named links – I do have a way to implement them, but just haven't gotten round to it yet. I need to do a tiny bit of refactoring as well, I think.

It would be cool to have some kind of benchmark for performance as well, but I'm much less worried about Markdown.jl being crazy fast as long as it's not too slow to get the job done.

hayd commented 9 years ago

@one-more-minute a word of warning, I only got 25% through the list (so this could double/triple)! Will append anything else major that pops out. I agree it shouldn't be too bad - it's great to have a thorough test/perf. There was a couple of things that raise, IIRC they were from named links.

tab expansion may also be tricky, not sure how to do that. The example I gave above didn't render as I expected (FIXED)! You seemingly need to count chars as you render (or as you parse??).... the game is "render tabs as spaces as if tab stops were length 4".

I don't yet get the subtleties of escaping html characters but allowing some html...

I have fixed a couple of minor things in html rendering, will PR when I go through the entire list.

hayd commented 9 years ago

Ah, maybe the tab expansion should happen prior to main parsing, then it's a bit easier...

I have checked off some (easy ones) of these which are in a local julia branch.

hayd commented 9 years ago

I was sure there was an issue about CLRF but I can't find it Edit: here. I was wondering if the text should go through:

replace(..., r"\r(\n)?", '\n')

not sure how this would be done with the stream model.