danburzo / percollate

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.
https://danburzo.ro/projects/percollate/
MIT License
4.32k stars 166 forks source link

Error when handling a Table node when converting to markdown #161

Closed dannyob closed 1 year ago

dannyob commented 1 year ago

Environment

Description

Hi! My percollate currently falls over on sites like Wikipedia when converting to Markdown:

: workboat  ~%; percollate md 'https://en.wikipedia.org/wiki/Danny_O%27Brien_(journalist)'                       
Fetching: https://en.wikipedia.org/wiki/Danny_O%27Brien_(journalist) ✓
Enhancing web page: https://en.wikipedia.org/wiki/Danny_O%27Brien_(journalist) ✓
file:///usr/local/lib/node_modules/percollate/node_modules/mdast-util-to-markdown/lib/index.js:113
  throw new Error('Cannot handle unknown node `' + node.type + '`')
        ^

Error: Cannot handle unknown node `table`
    at Object.unknown (file:///usr/local/lib/node_modules/percollate/node_modules/mdast-util-to-markdown/lib/index.js:113:9)
    at Object.one [as handle] (file:///usr/local/lib/node_modules/percollate/node_modules/zwitch/index.js:108:17)
    at containerFlow (file:///usr/local/lib/node_modules/percollate/node_modules/mdast-util-to-markdown/lib/util/container-flow.js:36:15)
    at Object.containerFlowBound [as containerFlow] (file:///usr/local/lib/node_modules/percollate/node_modules/mdast-util-to-markdown/lib/index.js:158:10)
    at Object.root (file:///usr/local/lib/node_modules/percollate/node_modules/mdast-util-to-markdown/lib/handle/root.js:22:13)
    at Object.one [as handle] (file:///usr/local/lib/node_modules/percollate/node_modules/zwitch/index.js:108:17)
    at toMarkdown (file:///usr/local/lib/node_modules/percollate/node_modules/mdast-util-to-markdown/lib/index.js:71:22)
    at bundleMd (file:///usr/local/lib/node_modules/percollate/index.js:647:13)
    at async generate (file:///usr/local/lib/node_modules/percollate/index.js:710:3)
    at async md (file:///usr/local/lib/node_modules/percollate/index.js:758:9)

Node.js v18.13.0
: workboat  ~%;

I think this is because the mdast markdown conversion only supports table with the help of another npm module: see https://github.com/syntax-tree/mdast-util-to-markdown/issues/1

danburzo commented 1 year ago

Good catch, @dannyob! We were already supposed to be using mdast-util-gfm, but there was a small typo that prevented it from being used.

danburzo commented 1 year ago

Fixed in percollate@4.0.3.

Looking at the provided test case I can see potential problems with the way the whole remark pipeline takes the HTML verbatim and stringifies it to Markdown in ways that might trip some Markdown parsers, especially around <i><a></a></i>, so that might warrant further attention.

But the issue at hand should be solved.