XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

list index out of range when trying to convert file saved with LibreOffice #611

Open lognaturel opened 2 years ago

lognaturel commented 2 years ago

ODK forum thread.

Manually copying the contents of the file and saving them in a new document worked.

Related to https://github.com/XLSForm/pyxform/issues/604 in that these document compatibility issues come from openpyxl.

lindsay-stevens commented 2 years ago

From the original thread, this comment has an example file PRUEBA.xlsx which seems corrupted somehow. I can open it with Excel (2010) or LibreOffice (7.2.7.2) but has excessive style / format data. It is 674KB on disk, and approx. 85% of that is from the ./xl/styles.xml document within the xlsx zip file. After opening the file, there are dozens of hyperlink (Hipervínculo) custom formats. Opening the file with openpyxl (with read_only and data_only modes) takes about 45 seconds on my machine. For this kind of file, perhaps pyxform could have a file read timeout kwarg (in xls2xform_convert) to optionally limit resource usage when pyxform is used in a server / service context?

lindsay-stevens commented 2 years ago

From the original thread, this comment also has an example file PRUEBA.xlsx which also seems corrupted. I can open it with Excel (2010) or LibreOffice (7.2.7.2) but has excessive style / format data, as described above. The file won't open with openpyxl, instead an error relating to the workbook style data is thrown, as copied below. For this kind of file, perhaps pyxform could catch the error and in the warning, suggest that the user re-save the file with Excel or try copying the XLSForm data into a new workbook file? Alternatively, perhaps this is a known issue for openpyxl or could be fixed upstream there.

Error traceback ``` Error Traceback (most recent call last): File "/usr/local/lib/python3.8/unittest/case.py", line 60, in testPartExecutor yield File "/usr/local/lib/python3.8/unittest/case.py", line 676, in run self._callTestMethod(testMethod) File "/usr/local/lib/python3.8/unittest/case.py", line 633, in _callTestMethod method() File "/home/lindsay/repos/pyxform/repo/tests/test_xls2json_backends.py", line 170, in test_xlsx_with_many_empty_cells2 xlsx_data = xlsx_to_dict(xlsx_path) File "/home/lindsay/repos/pyxform/repo/pyxform/xls2json_backends.py", line 219, in xlsx_to_dict workbook = openpyxl.open(filename=path_or_file, read_only=True, data_only=True) File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook reader.read() File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 281, in read apply_stylesheet(self.archive, self.wb) File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet stylesheet = Stylesheet.from_tree(node) File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree return super(Stylesheet, cls).from_tree(node) File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree return cls(**attrib) File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 94, in __init__ self.named_styles = self._merge_named_styles() File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 114, in _merge_named_styles self._expand_named_style(style) File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/stylesheet.py", line 124, in _expand_named_style xf = self.cellStyleXfs[named_style.xfId] File "/home/lindsay/repos/pyxform/venv/lib/python3.8/site-packages/openpyxl/styles/cell_style.py", line 185, in __getitem__ return self.xf[idx] IndexError: list index out of range ```