Open laurielounge opened 6 years ago
More info - my first thought seems to have been wrong. Error looks like:
no subfile named /stylesheet.xml at ../perl5/Spreadsheet/ParseXLSX.pm line 967
Version
$Spreadsheet::ParseXLSX::VERSION = '0.27';
ParseExcel
our $VERSION = '0.65';
after unzipping the xlsx file into its own directory
find . -type f
looks like:
./workbook.xml
./_rels/.rels
./_rels/workbook.xml.rels
./[Content_Types].xml
./stylesheet.xml
./xl/worksheets/data.xml
cat _rels/.rels
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="workbook.xml" Id="rw_1"/></Relationships>
cat _rels/workbook.xml.rels
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/data.xml" Id="r_1"/>
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>
Just for emphasis:
cat _rels/workbook.xml.rels
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/data.xml" Id="r_1"/>
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>
and
cat workbook.xml
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheets>
<sheet name="Data" sheetId="1" r:id="r_1">
<bookViews>
<workbookView/></bookViews></sheet></sheets></workbook>
Looking at _rels/workbook.xml.rels, there is no actual Target="stylesheet.xml", at least, not in that folder.
The only other reference to stylesheet:
grep -r stylesheet *
comes from [Content_type].xml:
<Override PartName="/stylesheet.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
Further...
If I edit _rels/workbook.xml.rels, and change
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>
to
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="/stylesheet.xml" Id="r_2"/></Relationships>
The file parses.I'm guessing here, but I suspect when Excel opens the file, it has a harder look for stylesheet.xml than ParseXLSX is doing. Another possibilty is that it just throws up it's hand and says open the file anyway, without the stylesheet info. Will experiment. Looking at the info in stylesheet, and the way Excel displays the file, I suspect it's the former (stylesheet seems to have been applied).
Further still...
If we define "valid spreadsheet file" as one successfully read by Excel, then this is a valid spreadsheet file.
Once again
find . -type f
reveals
./workbook.xml
./_rels/.rels
./_rels/workbook.xml.rels
./[Content_Types].xml
./stylesheet.xml
./xl/worksheets/data.xml
grep -r Target *
reveals
_rels/.rels: <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="workbook.xml" Id="rw_1"/></Relationships>
_rels/workbook.xml.rels: <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="/xl/worksheets/data.xml" Id="r_1"/>
_rels/workbook.xml.rels: <Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>
We see that, in _rels/, we have a workbook.xml.rels file. _rels/.rels defines a target of workbook.xml (no ".rels"). _rels/workbook.xml.rels defines a "stylesheet.xml". Not "/stylesheet.xml"
I'm not currently sure which target is looked for first, but one assumes it's 'stlyesheet.xml', as that's where it's erroring. One assumes that fixing the location of stylesheet (which I will do manually do next) will cause the error to become "not found workbook.xml".
Right. I can confirm that if I manually edit _rels/workbook.xml.rels and adjust
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="stylesheet.xml" Id="r_2"/></Relationships>
to read
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="/stylesheet.xml" Id="r_2"/></Relationships>
ParseXLSX can parse the sheet. I can print the value in $ws->get_cell(0,0) correctly.
Excel can still successfully read the file (it is still a "valid workbook")
I see a possible approaches here. Pre-check to see if all of the targets exist and are in the appropriate places, adjust if necessary. Is the xml read once and then passed from routine to routine in ParseXLSX? If so, it shouldn't be too hard to see if the stylesheet is where the xml says it is.
I'll continue playing with it.
Hi gang,
test_sheet_new.xlsx
The attached file is a valid spreadsheet (can open in Excel) but stylesheet.xml and worksheet.xml, rather than being in /xl/, are in / and consequently, the sheet cannot be opened.
To test, run parse on the file. Error message indicates stylesheet.xml cannot be found.