The .html files include anchors, to be used for URL destinations.
These will typically look like <a id="destination"/>
When we convert html to .dita we should ensure we're only converting <a href="xx">link</a> elements to xrefs.
DITA doesn't have an anchor element. Instead, we should remove the anchor from the output, and set the id attribute of the surrounding element. When we encounter an anchor, we should inject the id into the surrounding object.
This process should be included for all .html pages we translate.
Extend htmlToDITA to include this.
Ian to analyse content to sort out locations of anchors. I suspect there are some instances when it's right to select the parent, but others when we have to find the next PageLayer.
Note: I suspect some anchors are positioned in absolute coords to be adjacent to the content they refer to, also in absolute coords. The solution to this is probably to either refactor the sources to put the content near to the anchor. Hmm, maybe we can do some analysis of the "top" coordinates for the content. I think they tend to go up at a rate of about a thousand per page.
[ ] @IanMayo to mock up this pattern of usage, particularly absolute positioning of anchors, floating tables, and white-space generated by blockquotes and/or empty paras.
[ ] Consider algorithm for finding
Current strategy (Sept 2023)
find anchors in html
find top attribute of parent div
loop through divs on page
find div with top nearest to the anchor
find level of div hierarchy that got converted to DITA
find that div in DITA
insert placeholder element at start of that div, with id to match anchor
The
.html
files include anchors, to be used for URL destinations.These will typically look like
<a id="destination"/>
When we convert
html
to.dita
we should ensure we're only converting<a href="xx">link</a>
elements toxrefs
.DITA doesn't have an anchor element. Instead, we should remove the anchor from the output, and set the
id
attribute of the surrounding element. When we encounter an anchor, we shouldinject
the id into the surrounding object.This process should be included for all
.html
pages we translate.Extend
htmlToDITA
to include this.Ian to analyse content to sort out locations of
anchors
. I suspect there are some instances when it's right to select the parent, but others when we have to find the nextPageLayer
.Note: I suspect some anchors are positioned in absolute coords to be adjacent to the content they refer to, also in absolute coords. The solution to this is probably to either refactor the sources to put the content near to the anchor. Hmm, maybe we can do some analysis of the "top" coordinates for the content. I think they tend to go up at a rate of about a thousand per page.
Current strategy (Sept 2023)
top
attribute of parent divtop
nearest to the anchor