HHS / simpler-grants-pdf-builder

PDF builder that's used by the SimplerNOFOs project, part of the Simpler Grants initiative at HHS.
4 stars 1 forks source link

Save some bookmarks 🙏 #41

Closed pcraig3 closed 14 hours ago

pcraig3 commented 14 hours ago

Summary

This one is strange. Most bookmarks are to headings, but sometimes they are to random areas in the middle of a paragraph.

When the document is exported as HTML from Word, the bookmark link target is an <a> tag with with an id that matches the href (a normal anchor link).

However, once it goes through mammoth, it looks like they move the text out of the paragraph for some reason and then the (now-empty) link target gets stripped out of the text once the markdown conversion happens.

idk what is going on here, to be honest.

As an example, the Word HTML export will look like this.

<p>This is a paragraph link with <a id="my_id">a bookmark target</a>.</p>

But in the mammoth export, it looks like this:

<p>This is a paragraph link with <a id="my_id"></a> a bookmark target.</p>

My fix here is to:

The outcome of the earlier example would look like this:

<p id="nofo_bookmark_my_id">This is a paragraph link with a bookmark target.</p>

Hopefully this resolves this issue, but we will have to see.

Note that the logic is written in a way that modifications to the HTML can happen without all the conditions being met (we can modify the hrefs without knowing if the target will be modified), but I am thinking that we will catch those if they come up.