XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

use openpyxl for XLSX files; upgrade xlrd to v2 #575

Closed sheppard closed 2 years ago

sheppard commented 2 years ago

Closes #501

Why is this the best possible solution? Were any other approaches considered?

See discussion in #501.

What are the regression risks?

There should be no noticeable changes for most users of pyxform. I started from a line-for-line translation of the xlrd integration and optimized some of the logic. One potential change is that dictionary keys will not contain consecutive inner spaces, since they are now populated from the cleaned headers rather than a separate step.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

There is a new dependency, but this should be transparent to most users.

Before submitting this PR, please make sure you have:

lindsay-stevens commented 2 years ago

Hi @sheppard thanks for adding this PR! I have a few requests to consider please. If you don't expect to have the bandwidth to address these please let us know, and they could be done in a follow-up PR.

codecov-commenter commented 2 years ago

Codecov Report

Merging #575 (05d5ac7) into master (66c0d5b) will decrease coverage by 0.61%. The diff coverage is 83.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #575      +/-   ##
==========================================
- Coverage   84.20%   83.59%   -0.61%     
==========================================
  Files          25       25              
  Lines        3672     3761      +89     
  Branches      865      930      +65     
==========================================
+ Hits         3092     3144      +52     
- Misses        440      474      +34     
- Partials      140      143       +3     
Impacted Files Coverage Δ
pyxform/utils.py 78.08% <72.41%> (-13.25%) :arrow_down:
pyxform/xls2json.py 79.46% <75.00%> (+0.06%) :arrow_up:
pyxform/xls2json_backends.py 77.43% <89.47%> (+1.10%) :arrow_up:
pyxform/constants.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 66c0d5b...05d5ac7. Read the comment docs.

sheppard commented 2 years ago

I updated the setup.py dependencies, added equivalency tests, and moved the imports to the module level in utils.py.

It looks like openpyxl's defusedxml integration is active by default as long as:

I confirmed this by adding a test to assertTrue(openpyxl.DEFUSEDXML) in test_xls2json_backends.py.

Setting os.environ would only be necessary to override the preferences of a user that had explicitly defined OPENPYXL_DEFUSEDXML=False in their environment. In that case, the user would still be able to disable defusedxml by importing openpyxl before pyxform had a chance to update os.environ.