Don't rely on worksheet_id to match filename

mnoble commented 12 years ago

Excel worksheets get an id when they are created. When worksheets are removed, it seems as though id's are not reassigned, but their filenames are. For example:

  Worksheet (id 1) #=> xl/worksheets/sheet1.xml
  Worksheet (id 2) #=> xl/worksheets/sheet2.xml
  Worksheet (id 3) #=> xl/worksheets/sheet2.xml

If you delete the second worksheet, the filenames change, but the ids do not, it seems. What you end up with is:

  Worksheet (id 1) #=> xl/worksheets/sheet1.xml
  Worksheet (id 3) #=> xl/worksheets/sheet2.xml

The current XLSX parser assumes the filename of a worksheet contains its id. This is not the case for the example above.

This change makes it load worksheets sequentually, from 1 to the number of worksheets, instead of id (which doesn't change). Assuming the worksheet filenames are changed when adding/removing worksheets, this should work.

bspaulding commented 12 years ago

Wow. That's weird. Good catch.

Can you add a test case for the Sheets::Parsers::NokogiriXlsxParser? I'm cool to pull this once there's a test.

mnoble commented 12 years ago

Updated.

The Excel file I added for this test is kind of mangled (data wise) because I had to remove a bunch of private stuff. The xml attributes are all setup correctly though; two worksheets (id 1 and 3).

Here's a link to the exception that was being raised as well: https://gist.github.com/3829072

bspaulding / Sheets

Don't rely on worksheet_id to match filename #1