ariabuckles / simple-markdown

JavaScript markdown parsing, made simple
MIT License
516 stars 101 forks source link

Parser aggressively breaks paragraphs up by punctuation. #59

Open yoshikischmitz opened 5 years ago

yoshikischmitz commented 5 years ago

If I parse a sentence like this:

md.defaultBlockParse('Lorem Ipsum Dolor. Sit, Amet')

the output I would expect is:

[
    {
        "content": [
            {
                "content": "Lorem Ipsum Dolor. Sit, Amet",
                "type": "text"
            },

        ],
        "type": "paragraph"
    }
]

Instead what I get is:

[
    {
        "content": [
            {
                "content": "Lorem Ipsum Dolor",
                "type": "text"
            },
            {
                "content": ". Sit",
                "type": "text"
            },
            {
                "content": ", Amet",
                "type": "text"
            }
        ],
        "type": "paragraph"
    }
]

For longer documents this creates quite a lot of redundant nodes which leads to bloated output and potential performance issues. I think it'd be great if either the library didn't break text up like this in the first place or had a streamlining step at the end to merge them into a single node.

ariabuckles commented 5 years ago

Hi @yoshikischmitz !

What you're reporting definitely happens, and I'd like to see if we can address it.

Could you explain a little more about your use case? Are you parsing markdown and then saving the parse tree / AST to a database, and concerned about that json being overly-verbose? Or are you parsing markdown and then immediately outputting it to react/html/something else? Also, what version of simple-markdown are you using?

  1. If you're outputting to react-dom or html, the output() step should be doing the type of text-node-joining that you suggest:

    or had a streamlining step at the end to merge them into a single node.

  2. If you're using a different outputter (such as for react-native?), then as-of simple-markdown@0.4.0 there's a slightly improved outputter API that allows you to customize how adjacent text nodes are treated, and should do the right thing by default most of the time. The code to use that API looks like:

    var output = SimpleMarkdown.outputFor(rules, 'react');

    If this is still giving you multiple text nodes next to each other in the output, you can customize the Array output rule. See simple-markdown.js:808 for the default react rule for combining array elements.

  3. If you're saving the output of parse() to a database or file or something and specifically need text nodes joined in the parse tree, you can do that by writing an outputter that only handles arrays specially. Unfortunately, the syntax to do so right now is a bit tricky (and long term can maybe be simplified).

If you could elaborate on your use case, and whether it matches one of the scenarios above, and/or if you have some sample code (doesn't need to be runnable, but just so I can see a bit more of what you're doing), then I'd love to help you get one of the above solutions working. (Or if those aren't relevant, see what I can do to help in a different way).

yoshikischmitz commented 5 years ago

@ariabuckles thanks a ton for the super detailed response! You guessed correctly, I am saving the output from the parser to the file system, serving it over the network to a client, which is then rendering it(I think this is a pretty cool use case of this library for mobile apps because it allows for rich text views that can be updated server side without relying on web views).

ariabuckles commented 5 years ago

Hi again @yoshikischmitz !

Thanks, that makes sense.

I'd recommend writing a custom outputter that does this AST transformation to join the nodes. This is somewhat clunky right now, but quite functional.

I've taken a shot at that here: https://gist.github.com/ariabuckles/039442f9e50525fada464ce34953725d (join-smd.js exports a function which takes an ast and outputs an ast with text nodes combined, which is demonstrated in test.js and test-readme.js). I did some basic testing, but it's possible I've missed testing some edge cases or wrote the table rules poorly; if anything comes up I'd be happy to fix it.

I'd love to make this API a bit easier to use in the future, but I've still got some edge cases and figuring things out to even see how a cleaner way might work.

Would something like that work for you?

ariabuckles commented 4 years ago

Closing this issue for inactivity, but please feel free to re-open or comment if the above solution doesn't work for you (or anyone reading this!)

niksy commented 4 years ago

Is there any possiblity of having this kind of output as option inside this module? I’m working with Vue component which could benefit from generic joined output. Currently, React output is fine for everything, but it’s tightly coupled to implementation.

ariabuckles commented 4 years ago

@niksy I think so! I haven't written good docs for this yet (sorry >_<) but something similar is possible by providing an Array: rule, it just happens during output time rather than parse time.

Conceptually, the Array: rule specifies how to output an array found in the AST, like most rules specify how to output a node of a specific type found in the AST.

The default Array rule for react loops through the array elements, and when it finds text concatenates all the text nodes into one text node.

If you just need to customize how this joining works for Vue, you can provide your own Array: rule customized from react's (or technically provide any function that takes an array of parsed nodes and returns an array of nodes to be output).

If you need to do that before parse time, unfortunately something like the smd-join.js gist from above is the best way to do that for now.

If you have some example code for how you're using this with vue, I'd be happy to try to figure out how best to accomplish what you're going for!

P.S. I just started using vue at work; I'm excited to hear you're using simple-markdown with vue :). If there are some things I can do to make working with this in vue easier, I'm open to it!

niksy commented 4 years ago

Just wanted to chime in regarding Vue/React dilemma—we’re using React implementation with Vue components and it works great! It’s still coupled to React-specific implementation, but we’ve managed to create Vue components easily.