AlJohri / docx2pdf

MIT License
506 stars 96 forks source link

docx2pdf doesn't preserve hyperlinks in table-of-contents #30

Closed haakonstorm closed 2 years ago

AlJohri commented 3 years ago

@haakonstorm few questions: 1) if possible, upload a sample document to this issue for further debugging 2) which operating system are you using? 3) does this issue occur if you manually convert the file to pdf using the GUI in Word? if so, is there any other alternative you've found that preserves the hyperlinks?

haakonstorm commented 3 years ago
Screenshot of Microsoft Word (05-04-2021, 21-16-31) Screenshot of Microsoft Word (05-04-2021, 21-16-47)

Index-test-macos-word-bigsur-m1.docx

haakonstorm commented 3 years ago

Index-test-macos-word-bigsur-m1.docx

haakonstorm commented 3 years ago
Screenshot of iTerm2 (05-04-2021, 21-19-49)
haakonstorm commented 3 years ago

This was for a customer (design company) who couldn't get this to work, so I tried to find workarounds as they stated they couldn't get Word itself to do this. Ideally I'd be able to shell script a simple automation to them with a drop folder or some other simple pipeline of sorts.

Word on the Mac isn't as scriptable AFAIK as it is on Windows. Customer seemed happy to have a solution they in turn could teach their customer to use. This was a pre-designed Word template of sorts, that needed hyperlinks for ToC + index. But, I'm more than happy to help out making docx2pdf better in any way I can. Open source is the best. I havent tried pandoc for this btw, it just occured to me that could perhaps have what we are looking for out of the box.

haakonstorm commented 3 years ago

Ah, pandoc wasn't ready for arm64, thats probablt why I went looking elsewhere.

AlJohri commented 3 years ago

@haakonstorm Three potential options I can think of..

1) I did a little bit of googling and it seems like at least on Windows, the DocTo tool has implemented this snippet:

https://github.com/tobya/DocTo/blob/2dbb01a1d303d9df7bccd94154b79e4d0d72ef93/src/WordUtils.pas#L211-L230

which uses the `Document.ExportAsFixedFormat` method https://docs.microsoft.com/en-us/office/vba/api/word.document.exportasfixedformat and seems to preserve bookmarks?

Perhaps this could work? Not sure.

2) If it seems to work using the Microsoft online services, and if your client has the budget, they can set something up using "Microsoft Power Automate" to create a "flow" that will automatically convert documents to PDF using that same online service.

3) Reverse engineer the network requests that Word runs when it converts a document to PDF using the online services and try to replicate those. It seems to hit some API in the https://wordcs.officeapps.live.com domain (perhaps https://wordcs.officeapps.live.com/document/export/pdf or https://wordcs.officeapps.live.com/wordauto/wordautomation.svc/rest from some light googling?)

AlJohri commented 2 years ago

related #52