Closed Tux closed 10 years ago
So I'm curious what you're actually expecting this to do. Spreadsheet::ParseXLSX->new
actually returns a Spreadsheet::ParseExcel object, so I think just continuing to use Spreadsheet::ParseExcel::FmtDefault->new
should work fine - it's what Spreadsheet::ParseXLSX does internally, anyway. I don't know that duplicating all of the internal classes that Spreadsheet::ParseExcel uses is necessarily a useful idea. The idea behind being a drop in adaptor is that the parser is different, but the classes that it ends up generating are still the same, so I think if you want one of those classes, you should just use it directly. If you want to submit a documentation patch for that, I'd certainly be open to that.
On Fri, 17 Jan 2014 16:05:33 -0800, Jesse Luehrs notifications@github.com wrote:
So I'm curious what you're actually expecting this to do.
Spreadsheet::Read is a wrapper module over spreadsheet parsers to feature a generic frontend.
use Spreadsheet::Read;
my $book = ReadData ("test.csv", sep => ";");
my $book = ReadData ("test.sxc");
my $book = ReadData ("test.ods");
my $book = ReadData ("test.xls");
my $book = ReadData ("test.xlsx");
my $book = ReadData ($fh, parser => "xls");
Inside Read.pm, I use code like this:
my @parsers = ( [ csv => "Text::CSV_XS" ], [ csv => "Text::CSV_PP" ], # Version 1.05 and up [ csv => "Text::CSV" ], # Version 1.00 and up [ ods => "Spreadsheet::ReadSXC" ], [ sxc => "Spreadsheet::ReadSXC" ], [ xls => "Spreadsheet::ParseExcel" ], [ xlsx => "Spreadsheet::XLSX" ], [ prl => "Spreadsheet::Perl" ],
out of that list it picks the first available parser for a format. My goal now was to replace to dead Spreadsheet::XLSX with Spreadsheet::ParseXLSX. As the latter looks like being API compliant with Spreadsheet::ParseExcel, I change code like
if ($io_ref) {
$oBook = $parse_type eq "XLSX"
? Spreadsheet::XLSX->new ($io_ref)
: Spreadsheet::ParseExcel->new (%parser_opts)->Parse ($io_ref);
}
else {
$oBook = $parse_type eq "XLSX"
? Spreadsheet::XLSX->new ($txt)
: Spreadsheet::ParseExcel->new (%parser_opts)->Parse ($txt);
}
to
if ($io_ref) {
$oBook = $parse_type eq "XLSX"
? $can{xlsx} =~ m/::XLSX/
? Spreadsheet::XLSX->new ($io_ref)
: Spreadsheet::ParseXLSX->new (%parser_opts)->parse ($io_ref)
: Spreadsheet::ParseExcel->new (%parser_opts)->Parse ($io_ref);
}
else {
$oBook = $parse_type eq "XLSX"
? $can{xlsx} =~ m/::XLSX/
? Spreadsheet::XLSX->new ($txt)
: Spreadsheet::ParseXLSX->new (%parser_opts)->parse ($txt)
: Spreadsheet::ParseExcel->new (%parser_opts)->Parse ($txt);
}
assuming it just works
Spreadsheet::ParseXLSX->new
actually returns a Spreadsheet::ParseExcel object, so I think just continuing to useSpreadsheet::ParseExcel::FmtDefault->new
should work fine - it's what Spreadsheet::ParseXLSX does internally, anyway.
OK. that I didn't read from the docs, but it replaces the undefined error with errors for color functionality
I don't know that duplicating all of the internal classes that Spreadsheet::ParseExcel uses is necessarily a useful idea. The idea behind being a drop in adaptor is that the parser is different, but the classes that it ends up generating are still the same, so I think if you want one of those classes, you should just use it directly. If you want to submit a documentation patch for that, I'd certainly be open to that.
I will now have to chase different errors:
• formatted merged cells • unformatted merged cells • hidden cells • cells with color:
The colors in Excel were defined like this:
sub _xlscolor { my ($clr, @clr) = @; defined $clr or return undef; @clr == 0 && $clr == 32767 and return undef; # Default fg color @clr == 2 && $clr == 0 and return undef; # No fill bg color @clr == 2 && $clr == 1 and ($clr, @clr) = ($clr[0]); @clr and return undef; # Don't know what to do with this "#" . lc Spreadsheet::ParseExcel->ColorIdxToRGB ($clr); } # _xls_color
which is called from the API like
fgcolor => _xls_color ($FnT->{Color}),
bgcolor => _xls_color (@{$FmT->{Fill}}),
where $FmT = $oWkC->{Format}; and $FnT = $FmT->{Font};
what happens, is that XLSX now returns the colors directly
Spreadsheet::Read::_xls_color (1, "#008000", "#FFFFFF")
instead of using the color table. I can change that in my functions, but it is not API-compatible. I can reduce all the errors like this
sub _xlscolor { my ($clr, @clr) = @; defined $clr or return undef; @clr == 0 && $clr =~ m/^#[0-9a-fA-F]+$/ and return $clr eq "#000000" ? undef : lc $clr;
@clr == 0 && $clr == 32767 and return undef; # Default fg color
@clr == 2 && $clr == 0 and return undef; # No fill bg color
@clr == 2 && $clr == 1 and ($clr, @clr) = ($clr[0]);
@clr and return undef; # Don't know what to do with this
$clr =~ m/^#[0-9a-fA-F]+$/ and return $clr eq "#000000" ? undef : lc $clr;
"#" . lc Spreadsheet::ParseExcel->ColorIdxToRGB ($clr);
} # _xls_color
but I wonder if that is what it really means. I then still have this:
t/64_dates.t ... 1/?
I do not want you to fix my module, but as it stands now, it is not just a drop-in replacement
$ diff -w t/dates 6c6
my $tests = 71; 11c11
< Spreadsheet::Read::parses ("xls") or
Spreadsheet::Read::parses ("xlsx") or 17c17
< ok ($xls = ReadData ("files/Dates.xls", attr => 1, dtfmt => "yyyy-mm-dd"), "Excel Date testcase");
ok ($xls = ReadData ("files/Dates.xlsx", attr => 1, dtfmt => "yyyy-mm-dd"), "Excel Date testcase"); 19,20c19,24 < my $ss = $xls->[1];
< my $attr = $ss->{attr};
SKIP: { ok (my $ss = $xls->[1], "sheet"); ok (my $attr = $ss->{attr}, "attr");
defined $attr->[2][1]{format} or skip "$xls->[0]{parser} $xls->[0]{version} does not reliably support formats", 68;
55,66c59 < < # Below can only be checked when SS::PE 0.34 is out < #use DDumper; < #foreach my $r (1..4,6..7) { < # foreach my $c (1..5) { < # my $cell = cr2cell ($c, $r); < # my $fmt = $ss->{attr}[$c][$r]{format}; < # defined $ss->{$cell} or next; < # printf STDERR "# attr %s: %-22s %s\n", < # $cell, $ss->{$cell}, defined $fmt ? "'$fmt'" : "
"; < # } < # }
}
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
So the problem with colors is that XLSX files themselves are allowed to specify colors directly rather than only being allowed to use the color table. I figured it would be more straightforward to just always return the actual color string rather than sometimes returning a color string and other times returning an index (and munging up the color table to include things that weren't actually in the color table in the file didn't seem like a good idea). It's true that this should at least be documented, though.
You mentioned on #ox
that the issue with d-mmm
was actually a bug in Spreadsheet::ParseExcel - were there any other issues you noticed other than those two?
On Wed, 22 Jan 2014 10:08:19 -0800, Jesse Luehrs notifications@github.com wrote:
So the problem with colors is that XLSX files themselves are allowed to specify colors directly rather than only being allowed to use the color table. I figured it would be more straightforward to just always return the actual color string rather than sometimes returning a color string and other times returning an index (and munging up the color table to include things that weren't actually in the color table in the file didn't seem like a good idea).
But that would better match what the older Excel format does. Read on. I can live with the current state of affairs with one single exception at the moment.
It's true that this should at least be documented, though.
What then should also be documented is the values for color index 32767 (default fg color). You now return "#000000" where I can detect undefined in Spreadsheet::ParseExcel
Withe this code: --8<---
sub _xls_color { my $clr = shift; defined $clr or return undef; $clr eq "#000000" and return undef; $clr =~ m/^#[0-9a-fA-F]+$/ and return lc $clr; $clr == 0 || $clr == 32767 and return undef; # Default fg color return "#" . lc Spreadsheet::ParseExcel->ColorIdxToRGB ($clr); } # _xls_color
sub _xlsfill { my ($p, $fg, $bg) = @; defined $p or return undef; $p == 32767 and return undef; # Default fg color $p == 0 && !defined $bg and return undef; # No fill bg color $p == 1 and return _xls_color ($fg); $bg < 8 || $bg > 63 and return undef; # see Workbook.pm#106 return _xls_color ($bg); } # _xls_fill -->8--- and later: --8<--- fgcolor => _xls_color ($FnT->{Color}), bgcolor => _xls_fill (@{$FmT->{Fill}}), -->8---
I pass all color tests for ParseExcel and ParseXLSX, but I do not like the $clr eq "#000000" line in _xls_color ()
You mentioned on
#ox
that the issue withd-mmm
was actually a bug in Spreadsheet::ParseExcel - were there any other issues you noticed other than those two?
Yes two things: • the rounding of percentages, but I have not yet been able to localize the exact point of difference. The only value from my test that acts different is 0.5% which now gets rounded to 1% in your parser • $FmT->{Hidden} not set on hidden cells
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
The default color is now returned as undef
, and I've added support for hidden and locked cells. The rounding issue with percents appears to be a floating point edge case issue as discussed on IRC (the results are different for perls configured with and without -ld
), so I'll leave this alone unless some evidence comes up that this is actually a bug. I'll get a release out shortly.
I wanted to replace Spreadsheet:XLSX with Spreadsheet:ParseXLSX in Spreadsheet::Read
I derive 2. from this note:
DESCRIPTION This module is an adaptor for Spreadsheet::ParseExcel that reads XLSX files.
It is however not a drop-in replacement
my $oFmt = Spreadsheet::ParseXLSX::FmtDefault->new;
is not available