ShayHill / docx2python

Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
https://docx2python.readthedocs.io/en/latest/
MIT License
163 stars 34 forks source link

Is it possible to "not use" any particular feature? #48

Closed AayushSameerShah closed 11 months ago

AayushSameerShah commented 11 months ago

I really have found this library useful 🙏🏻

Question 1️⃣

Can I disable hyperlink extraction? If I want the link text itself, can I just disable this feature? Like: Instead of <a href="http:/...">link text</a> I just want link text.

Question 2️⃣

Can I get the number of pages? WITHOUT reading whole document? I want the user to submit the doc which is only 5 pages long say, then without loading all contents, can I get the pages length?

Please guide, thanks.

ShayHill commented 11 months ago

Thank you for using docx2python.

  1. There is not currently a way to suppress hyperlink extraction. I will think about how this might be accomplished simply. For now, I would recommend a regex to strip away the html tags.

  2. I’ve had several requests for page count / page numbers. Unfortunately, word does not store page numbers or page breaks. These are assigned dynamically when the page renders, so there is no way to know the page count without re-implementing the hundreds-of-pages-long docx rendering specification.

Sent from my iPhone

On Dec 6, 2023, at 01:10, Aayush Shah @.***> wrote:



I really have found this library useful 🙏🏻

1️⃣

Can I disable hyperlink extraction? If I want the link text itself, can I just disable this feature? Like: Instead of link text I just want link text.

2️⃣

Can I get the number of pages? WITHOUT reading whole document? I want the user to submit the doc which is only 5 pages long say, then without loading all contents, can I get the pages length?

Please guide, thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/48, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIE4V3JYGKKYGFDT7PUTYIAK6VAVCNFSM6AAAAABAI4UVOOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDOOBRGU2DAMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

AayushSameerShah commented 11 months ago

@ShayHill Thank you so much 😄 ✌🏻