For every html top level node in the result, Is there any way to get a reference to the corresponding section in markdown source?

Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.

https://python-markdown.github.io/

BSD 3-Clause "New" or "Revised" License

3.8k stars 862 forks source link

For every html top level node in the result, Is there any way to get a reference to the corresponding section in markdown source? #677

Closed sras closed 6 years ago

sras commented 6 years ago

While doing markdown to html conversion, is there any way to attach reference to the corresponding section of original markdown source in every node of generated html?

waylan commented 6 years ago

No, the parser does not keep tract of this under the hood. Once a section of text is parsed, the original source is discarded. To be able to support such a feature would require a complete rewrite of the entire internals.

As a reminder, Python-Markdown is a relatively old parser. It was developed back when system resources were much more scarce than they are today. Not storing the entire Markdown source was considered a "feature" as it ensured that the parser used less memory. Of course, today that is no longer as much of an issue, but a complete rewrite would be a lot of work for very little gain. I'm not certain, but some of the newer parsers out there may support such a feature.

mitya57 commented 6 years ago

I would welcome this feature as well. In ReText, there is synchronized scrolling for source and HTML pages, and we currently implement that using a hack. Having that implemented properly in Python-Markdown would be awesome.

From my previous experiments, it would be quite easy for the parser to add line information to the tree, but the preprocessors (especially third-party ones) are the tricky part.

sras commented 6 years ago

@waylan

In my case, I don't need the mapping to be as granular as possible. So If there is a ul in the generated html, then I only want the ul to be mapped to the entire block in the corresponding markdown. I mean, I don't have to map each li to their source.

So, just mapping top level elements to sections in markdown can work.

I tried splitting the source markdown on "\n\n" and feed each of the items to markdown, and thus obtaining a mapping. But I met with cases like double newlines that may be present in embdedded html etc, So I am looking for a bit more reliable method...