arobase-che / md-attr-parser

A parser of markdown attributes
1 stars 2 forks source link

refactor ~ use an explicit grammar and PEGjs to generate the internal parser #7

Closed rivy closed 4 years ago

rivy commented 4 years ago

This set of commits does a small bit of housekeeping (de-linting and spelling) and then refactors the module to use an explicit grammar for parsing the attribute string. PEGjs is used to generate the internal parser code from the given grammar.

I've designed the grammar (and associated code) to generate the same output structure as the previous code.

The module code itself is greatly simplified. And all 80+ tests are passing except one set using t.snapshot(). I'm not sure how to make that test set pass. I don't understand how it passed initially ... so I've marked them "FixME" and commented them out in d78429e.

Overall, this change set should make the module easier to maintain and enhance.

And, along that line, (and actually the impetus for the refactor) I extended the grammar to add support for "hidden" attributes (within HTML comments, eg <!-- {.class #id key=value} -->); see 8543457. By using the "hidden" attributes, markdown parsers which don't support the basic attribute syntax will just ignore them instead of rendering them as random internal text. It allows for graceful failure and reasonable basic renders.

If you accept this PR, I'll submit the next one containing the "hidden" attribute support (it also contains CodeCov, AppVeyor CI, and Travis CI configurations for automatic testing).

Let me know if you'd like any changes.

arobase-che commented 4 years ago

Hi and thank you for the work !

Definitively, it improves the parsing speed. But it also adds a dependency.

And not a minor one. pegjs. It doesn't have a stable version. The version used here is 6 years old. And finally it seriously increases the size of the code.

About the support of hidden attributes, why should it be managed here ? Why not in an other plugin ? Since I don't understand why, it can consider a merge. md-attr-parser isn't intend to be linked to HTML, so HTML comment seems out of scope.

I'm sure, the code can't be improved, I'm not sure, it's the right way.

Thank you about fixing the spelling. You can create an other PR, it will be merge immediately. :+1:

rivy commented 4 years ago

Thanks for considering this ...

I'll PR the single lint/spelling commit tomorrow.

I'd like to address your other points...

I'm trying to build to a remark plugin that assigns attributes to various pieces of a markdown document, but allows display of an accurate document when rendered by software which doesn't understand the non-standard attribute add-on. Some current practice is to hide the attribute syntax within a comment (see markdown-it-decorate).

Notably, "HTML" comments are actually valid markdown and used as markdown comments, eg, <!-- this is a markdown comment -->. And you're already using a non-standard markdown syntax to describe the attributes, { ... }. Using <!--{ ... }--> as an alternate/variant syntax gives the added benefit of hiding the attributes from more basic rendering/renderers. IMO, it's a minor, very useful upgrade.

I'm trying to change this here because there is already a remark plugin, remark-attr, which works and uses your module here to parse the attributes. So, if your module is upgraded to parse an expanded attribute syntax then it should only require very minor changes to upgrade that plugin. I do think the bulk of the change belongs here because it is a parallel parsing problem.

As far as the dependency on PEGjs... it's only a developer dependency, needed only when generating a new src/parser.js code file when the grammar changes. The code does increase in size from 8k to 22k, I agree, but I thinks that's relatively small compared to the code it's used in and believe there's strong benefit behind that change. And that could be mitigated with minification (final size would be about 15k). I did use the older version on purpose so that it could be used with NodeJS v8 as a dev tool. It really could be completely removed as a dependency by generating the src/parser.js directly, outside of the repo, when needed.

I understand if, ultimately, you don't want to consider the change. I can just propose my fork with the upgrade to the author of remark-attr as an alternative parser, but, I wanted to offer the changes here first.

Again, thanks for the quick response and consideration.

arobase-che commented 4 years ago

I'll PR the single lint/spelling commit tomorrow.

Thank a lot.

I'd like to address your other points...

I'm trying to build to a remark plugin that assigns attributes to various pieces of a markdown document, but allows display of an accurate document when rendered by software which doesn't understand the non-standard attribute add-on. Some current practice is to hide the attribute syntax within a comment (see markdown-it-decorate).

I think you chose the worst example ever. markdown-it-decorate is a fork of markdown-it-attrs where the comment syntax is handled.

Notably, "HTML" comments are actually valid markdown and used as markdown comments, eg, <!-- this is a markdown comment -->. And you're already using a non-standard markdown syntax to describe the attributes, { ... }. Using <!--{ ... }--> as an alternate/variant syntax gives the added benefit of hiding the attributes from more basic rendering/renderers. IMO, it's a minor, very useful upgrade.

About the { ... } syntax, in fact, it's somehow commune. PHP Markdown, Pandoc even Kramdown. About comments. HTML isn't supported by default by remark.

IMO, it's a not the goal of that plugin and it should be treated as an option inside remark-attr.

As far as the dependency on PEGjs... it's only a developer dependency, needed only when generating a new src/parser.js code file when the grammar changes.

It's worst than a run-time dependency. If it breaks, you have to redo absolutely everything.

I understand if, ultimately, you don't want to consider the change. I can just propose my fork with the upgrade to the author of remark-attr as an alternative parser, but, I wanted to offer the changes here first.

I'm not ok about using PEGjs here, but feel free to make a fork if you want. :)
About the HTML comment trick, I understand the use case. I think it should be implemented inside remark-attr as an option.

Again, thanks for the quick response and consideration.

Sorry this time I was busy. You are welcome.

rivy commented 4 years ago

Alright, thanks again for the look.

Lint PR has been submitted (PR#8).