Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.54k stars 595 forks source link

docx - error while parsing table with merged cells #3139

Closed veredmm closed 1 month ago

veredmm commented 1 month ago

Describe the bug:

error while parsing table with merged cells :

AttributeError: '_Row' object has no attribute 'grid_cols_before'

To Reproduce Provided a sample word file with table and spanned cells

table_span.docx

noting that the error is raised in version 0.14.3 and didn't show up when i used version 0.12.6

scanny commented 1 month ago

@veredmm check your python-docx version. That will need to be updated to 1.1.2 for recent versions of unstructured.

scanny commented 1 month ago

Closing since I'm pretty confident that's going to solve the problem, but feel free to reopen if you're still getting an error.