ShayHill / docx2python

Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
https://docx2python.readthedocs.io/en/latest/
MIT License
157 stars 35 forks source link

Extracted image looks different from the one displayed on Word2021 #37

Closed songyuc closed 2 years ago

songyuc commented 2 years ago

Hi guys, I am new to docx2python and learning to use it. The case is that I find the extracted image looks different from the one displayed on Word2021. Image A displayed on Word2021: image Image B extracted with python: image

They look different, as A looks like a part of B. So, how can I solve it?

Your answer and guide will be appreciated!

ShayHill commented 2 years ago

Can you post an example file?

Sent from my iPhone

On Aug 10, 2022, at 09:19, songyuc @.***> wrote:



Hi guys, I am new to docx2python and I want know whether I can extract all the pictures and the text of the corresponding legend in a word document? Here is the example, [image]https://user-images.githubusercontent.com/27288110/183924628-be76f8f6-11de-4d97-833c-b7a0343acad1.png

Your answer and guide will be appreciated!

— Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/37, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIEYAE7XLISKNF2P3ZY3VYO27ZANCNFSM56EXCLDA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

songyuc commented 2 years ago

Can you post an example file? Sent from my iPhone On Aug 10, 2022, at 09:19, songyuc @.> wrote:  Hi guys, I am new to docx2python and I want know whether I can extract all the pictures and the text of the corresponding legend in a word document? Here is the example, [image]https://user-images.githubusercontent.com/27288110/183924628-be76f8f6-11de-4d97-833c-b7a0343acad1.png Your answer and guide will be appreciated! — Reply to this email directly, view it on GitHub<#37>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIEYAE7XLISKNF2P3ZY3VYO27ZANCNFSM56EXCLDA. You are receiving this because you are subscribed to this thread.Message ID: @.>

Here is the file, https://docs.google.com/document/d/1kUnmt8HfXDjr6OQN9aiiBeFSJAsrfnk7/edit?usp=sharing&ouid=117403696964406551444&rtpof=true&sd=true

ShayHill commented 2 years ago

Thank you. I will have a look. Might be Monday before I can get it opened up.

Sent from my iPhone

On Aug 12, 2022, at 09:10, songyuc @.***> wrote:



Can you post an example file? … Sent from my iPhone On Aug 10, 2022, at 09:19, songyuc @.> wrote:  Hi guys, I am new to docx2python and I want know whether I can extract all the pictures and the text of the corresponding legend in a word document? Here is the example, [image]https://user-images.githubusercontent.com/27288110/183924628-be76f8f6-11de-4d97-833c-b7a0343acad1.png Your answer and guide will be appreciated! — Reply to this email directly, view it on GitHub<#37https://github.com/ShayHill/docx2python/issues/37>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIEYAE7XLISKNF2P3ZY3VYO27ZANCNFSM56EXCLDA. You are receiving this because you are subscribed to this thread.Message ID: @.>

Here is the file, https://docs.google.com/document/d/1kUnmt8HfXDjr6OQN9aiiBeFSJAsrfnk7/edit?usp=sharing&ouid=117403696964406551444&rtpof=true&sd=true

— Reply to this email directly, view it on GitHubhttps://github.com/ShayHill/docx2python/issues/37#issuecomment-1213152151, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIEZFQMH2CPVRUAHUQCDVYZLNDANCNFSM56EXCLDA. You are receiving this because you commented.Message ID: @.***>

ShayHill commented 2 years ago

Thank you for your patience. I have examined the file. A docx file keeps images inside an internal folder. In this case, the image is "image1.tiff", which is your "Image B". Docx crops this image when displaying it, so you only see the upper portion ("Image A"). The only way to replicate this would be to alter the "image1.tiff" image file, which is outside the scope of docx2python.

Thank you for reaching out, however. And thank you for using doxc2python.