bokuweb / docx-rs

:memo: A .docx file writer with Rust/WebAssembly.
https://bokuweb.github.io/docx-rs/
MIT License
354 stars 62 forks source link

Extract text from Word textboxes [proposed label: enhancement] #688

Open Mrodent opened 8 months ago

Mrodent commented 8 months ago

I just did a read_docx as part of my testing for my project on a test .docx file with various things including a textbox. Examining the resulting Value::Object I can't find the text in my textbox anywhere. I can see from the crates.io page that at the bottom, under "Features", "Textbox" is left unticked. Does this mean that the parsing basically ignores all textboxes?

And yet, when I uncompress the .docx file, in document.xml there it is, near the end:

"v:textbox style="mso-fit-shape-to-text:t"><w:txbxContent><w:p w:rsidR="0094123E" w:rsidRPr="00DF617B" w:rsidRDefault="0094123E" w:rsidP="0094123E"><w:pPr><w:ind w:left="0" w:firstLine="0"/></w:pPr><w:r w:rsidRPr="00DF617B"><w:t>Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</w:t></w:r></w:p><w:p w:rsidR="0094123E" w:rsidRDefault="0094123E"/></w:txbxContent></v:textbox>"

Have I got this right about omitting textboxes currently?

If so, any reason why this is not apparently currently included in the parsing? It's slightly irksome because it means I'll have to cobble together my own code to parse document.xml.

bokuweb commented 8 months ago

@Mrodent Thanks for your report. Could you please provide docx?

Mrodent commented 8 months ago

Here's a small .docx file with a text box. On my setup the text in the text box is just ignored when I parse. test_file_2.docx

... but if you uncompress you'll find what I've included in my previous post.

By the way, I have only Word 2007 installed ... this may make a difference to something.

Mrodent commented 6 months ago

Edited the title in the hope that you might find time to give this some thought. Omitting text from text-boxes seems a bit of an oversight, which could seemingly be corrected fairly easily...