doy / spreadsheet-parsexlsx

parse XLSX files
http://metacpan.org/release/Spreadsheet-ParseXLSX
27 stars 35 forks source link

Incorrect value for cells with multiple t nodes #73

Open andrewgregory opened 7 years ago

andrewgregory commented 7 years ago

It looks like when a cell has an inlineStr with multiple t nodes ParseXLSX only uses the first, discarding any text in subsequent nodes. The sharedStrings parser, on the other hand, properly steps through all nodes and joins their values. Unfortunately, I can't share the original spreadsheet, Excel insists on moving values into sharedStrings when I try to create a minimal example, and I don't know the format well enough to craft an example by hand, so I can't give you a proper test file. The raw XML for the cell looks roughly like this, though:

<c s="10" r="A1" t="inlineStr">
    <is>
        <r>
            <t xml:space="preserve">Foo </t>
        </r>
        <r>
           <t xml:space="preserve">Bar</t>
       </r>
    </is>
</c>

The following patch appears to fix the issue:

diff --git a/lib/Spreadsheet/ParseXLSX.pm b/lib/Spreadsheet/ParseXLSX.pm
index 5df5111..981b925 100644
--- a/lib/Spreadsheet/ParseXLSX.pm
+++ b/lib/Spreadsheet/ParseXLSX.pm
@@ -384,14 +384,15 @@ sub _parse_sheet {
                     $sheet->{MaxCol} = $col
                         if $sheet->{MaxCol} < $col;
                     my $type = $cell->att('t') || 'n';
-                    my $val_xml;
+                    my $val = undef;
                     if ($type ne 'inlineStr') {
-                        $val_xml = $cell->first_child('s:v');
+                        my $val_xml = $cell->first_child('s:v');
+                        $val = $val_xml->text if $val_xml;
                     }
                     elsif (defined $cell->first_child('s:is')) {
-                        $val_xml = ($cell->find_nodes('.//s:t'))[0];
+                        $val = join '',
+                          map { $_->text } $cell->find_nodes('.//s:t');
                     }
-                    my $val = $val_xml ? $val_xml->text : undef;

                     my $long_type;
                     my $Rich;