doy / spreadsheet-parsexlsx

parse XLSX files
http://metacpan.org/release/Spreadsheet-ParseXLSX
27 stars 35 forks source link

Some formatted shared strings still mis-parsed #11

Closed merrilymeredith closed 10 years ago

merrilymeredith commented 10 years ago

perl 5.15.2 (AS build 1402) Spreadsheet::ParseXLSX 0.08 XML::Twig 3.44

This rich-formatted sharedStringItem was finding its way into $workbook->{PkgStr} and the cells using it as undef. The current find_nodes search is only finding immediate children.

<si>
 <r>
  <t xml:space="preserve"> </t>
 </r>
 <r>
  <rPr>
   <sz val="10"/><color indexed="8"/><rFont val="Arial"/><family val="2"/>
  </rPr>
  <t xml:space="preserve">This was the actual text content of the cell, I guess someone just left a file with messy formatting and/or excel didn't clean up what are practically waste spaces.</t>
 </r>
 <r>
  <rPr>
   <sz val="11"/><color theme="1"/><rFont val="Calibri"/><family val="2"/><scheme val="minor"/>
  </rPr>
  <t xml:space="preserve"> </t>
 </r>
</si>

Tiny patch follows:

diff --git a/lib/Spreadsheet/ParseXLSX.pm b/lib/Spreadsheet/ParseXLSX.pm
index 0dca08a..e202ccb 100644
--- a/lib/Spreadsheet/ParseXLSX.pm
+++ b/lib/Spreadsheet/ParseXLSX.pm
@@ -254,7 +254,7 @@ sub _parse_shared_strings {
             my $node = $_;
             # XXX this discards information about formatting within cells
             # not sure how to represent that
-            { Text => join('', map { $_->text } $node->find_nodes('t')) }
+            { Text => join('', map { $_->text } $node->find_nodes('.//t')) }
         } $strings->find_nodes('//si')
     ];
 }
doy commented 10 years ago

Thanks!

doy commented 10 years ago

Released in 0.09.