Closed lindsay-stevens closed 2 years ago
Merging #544 (69c5299) into master (2e631b1) will decrease coverage by
0.54%
. The diff coverage is96.49%
.
@@ Coverage Diff @@
## master #544 +/- ##
==========================================
- Coverage 83.85% 83.31% -0.55%
==========================================
Files 25 25
Lines 3747 3727 -20
Branches 875 871 -4
==========================================
- Hits 3142 3105 -37
- Misses 457 482 +25
+ Partials 148 140 -8
Impacted Files | Coverage Δ | |
---|---|---|
pyxform/constants.py | 100.00% <ø> (ø) |
|
pyxform/question.py | 93.42% <ø> (ø) |
|
pyxform/xls2json.py | 75.97% <95.00%> (-2.59%) |
:arrow_down: |
pyxform/utils.py | 86.30% <100.00%> (+1.44%) |
:arrow_up: |
pyxform/xls2json_backends.py | 76.47% <100.00%> (-5.99%) |
:arrow_down: |
pyxform/survey.py | 93.49% <0.00%> (+0.76%) |
:arrow_up: |
pyxform/xls2xform.py | 81.81% <0.00%> (+2.59%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 2e631b1...69c5299. Read the comment docs.
Have you seen the messages when converting a real form? I am not seeing any of them.
Thanks for this review. It seems like I have put the spell check too far down the pipeline. As you mentioned, only supported sheet names would come through from the actual spreadsheet parsing step before it. I'll have another go and use end to end tests instead. This kind of issue is why I avoided using the PyxformTestCase
- it's close to, but not fully, end to end.
Hi @lognaturel - please see below change list. I think this addresses all of the above and what was discussed a week or so ago.
Apologies for the delay, here. Looks great. 🙏
Closes #443 Closes #527
Why is this the best possible solution? Were any other approaches considered?
It implements the suggested heuristic for Levenshtein distance <= 2 to detect possible sheet name misspellings. It's possible that this will warn about sheet names for sheets that have auxilliary metadata that is not meant for processing by pyxform. For example the "osm" sheet name may be similar to many other initialisms. In that case, the warning prompts users to prefix such sheet names with an underscore to suppress the warning.
Other approaches could involve more complex spell check algorthims. The current Levenshtein distance algo is well documented and does a decent job at finding similar spellings. It seemed like overkill to add a module dependency just to have more advanced spell check.
What are the regression risks?
As mentioned above, some users may like to store extra metadata in similarly named sheets to those that are part of the SUPPORTED_SHEET_NAMES constant. In which case they would need to rename the sheets to suppress the warning, if the warning is a problem.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
I don't think a documentation change is needed.
Before submitting this PR, please make sure you have:
tests_v1
nosetests
and verified all tests passblack pyxform
to format code