Closed hgcummings closed 4 years ago
Hi there, thank you for taking a look at pylightxl and submitting this to our attention! We are taking a look at it now.
This is a confirmed bug. We will take a look at the fix this week. I'll keep you posted on updates! :) thanks again for submitting this!
Issue identified: a single cell with multiple text formats within the cell is split up differently in sharedString.xml than normal string.
working on accounting for these cases now
the latest commit on master (https://github.com/PydPiper/pylightxl/commit/644313050a41835890dc0195604746947d96c2a6) has the fix. Please give it a try and let us know if your issue still persists.
We are currently in the process or writing unit test against this and will push it to the next version.
Sorry for not responding sooner. Thanks for picking this up. I can confirm that my original issue is resolved in v1.46
In spreadsheets with large amounts of text, I'm finding that reading text from one field sometimes gives me the text (or even partial text) of another field.
As an example, using this publicly available Excel document: https://digital.nhs.uk/binaries/content/assets/legacy/excel/8/c/hes_data_dictionary_20170914.xlsx (note that there's a Content-Disposition header that means it will save under a different filename)
For example,
print(db.ws('APC').address(address='H276'))
should give "Age at start of the episode (STARTAGE), with decimalised values for babies." but it actually gives "Rule # 631, 641, 651 , 660 and 671", which appears to be the contents of cell J43 in a different sheet (OP). Requesting the contents of that cell (print(db.ws('OP').address(address='J43'))
) gives the text from K32 in sheet APC.Meanwhile,
print(db.ws('APC').address(address='H261'))
gives "34 Decision not to treat - decision not to treat made or no further contact required", which is one line from the middle of a large text cell (which is repeated in all three sheets, so I'm not exactly sure which one its coming from).Other possibly useful observations: I certainly see correct results for large portions of the file. It seems to start going wrong about halfway through. The errors seem to be consistently reproducible (i.e. I get the exact same wrong text for each cell each time). Deleting all the contents of one of the text-heavy columns (column
I
) seemed to resolve the issue (but undoing this change and saving again caused it to come back). I guess there's some subtlety in how the Excel file format references text internally that's throwing off pylightxl.Let me know if I can provide any more information.