Open twilco opened 6 years ago
This XLS file given is laid out very strangely in such a way as to confuse the parser. I'd almost call this file corrupt, even.
XLS is laid out as an ordered sequence of binary "records" that indicate various changes and hold data. This particular file has the "begin sheet template" and "begin sheet Values" records before any cell records, which I believe to be incorrect. (That said, XLS has no formal spec, and Excel opens it fine...)
Edit: This file is indeed valid, but laid out in a complex manner that will require Tabitha to do a lot more bookkeeping to parse it correctly. Files formatted in this way could cause rows to be emitted out of order or on a sheet different than the one they are actually located in. This is a bug we want to fix.
Tabitha appears to be assigning the wrong row number to rows in the XLS file found at this URL: https://s3.amazonaws.com/widen-ingester-dev/tabithaDoesn'tSeeFirstPageSanitized.xls
In our application, we only care about rows in the first sheet (page). Our code looks like this:
In the attached spreadsheet, there is data in the first sheet, but Tabitha seems to think everything instead starts on page 1 (the second sheet). The code provided above iterates zero times for that reason, which is not what we would expect to happen.
Something to note: after converting this file to XLSX using Microsoft Excel, Tabitha recognizes the page numbers correctly, so perhaps this is an XLS only issue.