CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
183 stars 55 forks source link

Internal links are simply dropped #221

Closed sunu closed 7 years ago

sunu commented 7 years ago

Hi, first of all thanks a lot for making this library. Really appreciate the effort you folks have put in.

I'm trying to convert a docx file with internal links and I notice all the links are simply dropped in the html output. I have attached the simplest test case to reproduce this. test.docx

I did a bit of investigation on my own and it seems at https://github.com/CenterForOpenScience/pydocx/blob/3d01b3c7210f78a7c69409feac8c2cc8663d7de1/pydocx/export/html.py#L510 the value of tag is always None for internal links because hyperlink.target_uri seems to be always None. The only unique value identifying an internal link seems to be stored in hyperlink.anchor, but I'm not sure how to identify the target from that value.

I would love to see a fix for this. If you folks can give me some pointers on how to fix this, I'm all in for writing up a PR for this if you like.

Thanks.

winhamwr commented 7 years ago

Fixed with #222