Closed antfin closed 1 month ago
@antfin make sure you have the latest python-docx
package installed:
pip install -U python-docx
The .grid_cols_before
attribute was added in the latest release of python-docx
(v1.1.2). That dependency works on a fresh install but initially didn't work when updating with pip install -U unstructured[docx]
. That was fixed a couple days ago but not sure it was released yet.
Closing as assumed fixed but don't hesitate to reopen if it's still giving you trouble after updating :)
Describe the bug Issue parsing 5G 3GPP spec (e.g. https://www.3gpp.org/ftp/Specs/archive/23_series/23.503/23503-i50.zip from https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3334)
To Reproduce Try to parse the document
Expected behavior Document parsed without error
Screenshots We have this exception
AttributeError: '_Row' object has no attribute 'grid_cols_before'
Environment Info I'm using Python 3.11.8 in my Mac
Additional context Commenting in unstructured/partition/docx.py the line related to row.grid_cols_after and row.grid_cols_before. It works so it seems that certain rows don't have these fields. Is it possible to make a check and do the for loop only if the fields exist?