Open jetlime opened 8 months ago
Thank you got the clear issue report. Very helpful.
It appears that the is no agreed way to represent captions for markdown tables. Given that we use markdown-it
for parsing markdown, this would appear to be the best option:
Bug Report 🐛
Whenever a html table is defined with a caption, the transformation to Markdown yields to an invalid md table.
Expected Behavior
The following html table,
Shall be parsed in the following valid markdown,
Which parses into a valid Markdown table:
Average monthly active recipients of the service, in the EU region over prior 6 months (est.) | | Aug. 2022 - Jan. 2023 | Feb. 2023 - July 2023 | | Wikibooks | 6,919,000 | 1,611,000 | | Wikidata | 1,056,000 | 1,051,000 | | Wikimedia Commons | 2,845,000 | 3,272,000 | | Wikinews | 6,283,000 | 1,035,000 | | Wikipedia | 151,556,000 | 151,088,000 | | Wikiquote | 6,811,000 | 1,548,000 | | Wikisource | 7,106,000 | 1,845,000 | | Wikispecies | 29,000 | 37,000 | | Wikiversity | 6,360,000 | 1,082,000 | | Wikivoyage | 616,000 | 632,000 | | Wiktionary | 8,955,000 | 8,425,000 | | | 2.4[1] | 2.4[1] |
Current Behavior
Given the previous html table, including a caption, the tool transform the html into the following markdown content,
Which is an invalid md table:
Steps to Reproduce
npm install -g @accordproject/markdown-cli
wget https://foundation.wikimedia.org/wiki/Legal:EU_DSA_Userbase_Statistics --output-file test.html
markus transform --from html --to markdown --input test.html --output test.md
test.md
using a md parser to visiualise the invalid table parsing.Context (Environment)
Parsing HTML to Markdown for web archiving.
Desktop