This one is strange. Most bookmarks are to headings, but sometimes they are to random areas in the middle of a paragraph.
When the document is exported as HTML from Word, the bookmark link target is an <a> tag with with an id that matches the href (a normal anchor link).
However, once it goes through mammoth, it looks like they move the text out of the paragraph for some reason and then the (now-empty) link target gets stripped out of the text once the markdown conversion happens.
idk what is going on here, to be honest.
As an example, the Word HTML export will look like this.
<p>This is a paragraph link with <a id="my_id">a bookmark target</a>.</p>
But in the mammoth export, it looks like this:
<p>This is a paragraph link with <a id="my_id"></a> a bookmark target.</p>
My fix here is to:
find these empty id links
rename the id so that I can track that this happened
add the id to the parent
The outcome of the earlier example would look like this:
<p id="nofo_bookmark_my_id">This is a paragraph link with a bookmark target.</p>
Hopefully this resolves this issue, but we will have to see.
Note that the logic is written in a way that modifications to the HTML can happen without all the conditions being met (we can modify the hrefs without knowing if the target will be modified), but I am thinking that we will catch those if they come up.
Summary
This one is strange. Most bookmarks are to headings, but sometimes they are to random areas in the middle of a paragraph.
When the document is exported as HTML from Word, the bookmark link target is an
<a>
tag with with an id that matches the href (a normal anchor link).However, once it goes through mammoth, it looks like they move the text out of the paragraph for some reason and then the (now-empty) link target gets stripped out of the text once the markdown conversion happens.
idk what is going on here, to be honest.
As an example, the Word HTML export will look like this.
But in the mammoth export, it looks like this:
My fix here is to:
The outcome of the earlier example would look like this:
Hopefully this resolves this issue, but we will have to see.
Note that the logic is written in a way that modifications to the HTML can happen without all the conditions being met (we can modify the
hrefs
without knowing if the target will be modified), but I am thinking that we will catch those if they come up.