CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
186 stars 55 forks source link

Anchors aren't taken into account when resolving Hyperlinks #205

Closed jhubert closed 8 years ago

jhubert commented 8 years ago

Some representations of Hyperlinks separate the link from the anchor. The resulting href values in the HTML don't include the anchor part.

For example (note the w:anchor attribute):

<w:hyperlink r:id="rId5" w:anchor="testing" w:history="1">
  <w:r w:rsidRPr="00A87DC3">
    <w:rPr>
      <w:rStyle w:val="Hyperlink"/>
    </w:rPr>
    <w:t>HERE</w:t>
  </w:r>
</w:hyperlink>

References the rId5 relationship defined here:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="https://www.google.com" TargetMode="External"/>
</Relationships>

But results in the following HTML:

<a href="https://www.google.com">HERE</a>

instead of:

<a href="https://www.google.com#testing">HERE</a>

The anchor portion is missing entirely. Testing document attached.

hyperlink_anchor_issue.docx