ABRAM -> OLD value sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid)
ABRAM -> new sheetfile where RIGHT xmlfile is set
sheetfile = self._filehandle(xmlfile)
ABRAM -> print xmlfile name with ID which was passed to this function
print "id passed to _convert: ", sheetid, ", xml file: ", xmlfile
sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid)
if not sheetfile and sheetid == 1:
sheetfile = self._filehandle("xl/worksheets/sheet.xml")
if not sheetfile:
@@ -317,25 +303,17 @@
for sheetNode in sheetNodes:
attrs = sheetNode._attrs
name = attrs["name"].value
ABRAM -> prrint line with SheetName, sheetId, relationID
ABRAM -> there variable id with value is generated
ABRAM -> but it could be some kind of missmatch on them
if self.appName == 'xl' and len(attrs["r:id"].value) > 2:
if 'r:id' in attrs: id = int(attrs["r:id"].value[3:])
else: id = int(attrs['sheetId'].value)
else:
if 'sheetId' in attrs: id = int(attrs["sheetId"].value)
else: id = int(attrs['r:id'].value[3:])
ABRAM -> there id is from "r:id' -> rId[123] or from sheetId -> sheetId="123"
# ABRAM -> which has to be interpreted in _convert !!!
self.sheets.append({'name': name, 'id': id})
class Relationships:
def init(self):
self.relationships = {}
ABRAM -> dictionary to hold names of xml files with related rId value
Problem with oryginal script is how to get ID of sheet, for sheet name.
Example of use:
./orig_xlsx2csv.py -a -d ";" GENEROWANY.xlsx /tmp/XXXXX
Sheet 15 not found
Which means that 'id' 15 goes to "def _convert(self, sheetid, outfile):" where
name is created like that: "sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid)"
What kind of ID is prepared?
Please refer to Class Workbook, there is lines:
What I know this is like, appName = XL, then lenght of value > 2, so it will always passed to: id = int(attrs["r:id"].value[3:])
I do not know what to think about it, but this check will always go to one direction to made id = int(attrs["r:id"].value[3:])
if self.appName == 'xl' and len(attrs["r:id"].value) > 2:
if 'r:id' in attrs: id = int(attrs["r:id"].value[3:])
else: id = int(attrs['sheetId'].value)
else:
if 'sheetId' in attrs: id = int(attrs["sheetId"].value)
else: id = int(attrs['r:id'].value[3:])
self.sheets.append({'name': name, 'id': id})
This part of script can be used only when DIGITS on sheetId=XX, rIdXX and Target="worksheets/sheetXX.xml are same.
I mean sheetId=8 -> r:id=rId8 -> Target="worksheets/sheet8.xml" only then it works like exepcted.
When XLSX was generated by PHPExcel libs situation is different then expected.
Because sheetId can has other NUMBER then r:id or sheet name.
It looks like name of xml file should be taken by rId from file "xl/_rels/workbook.xml.rels"
Which means value 15 was passed to: def _convert(self, sheetid, outfile):
Then it failed because: sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid)
-> sheetfile = self._filehandle("xl/worksheets/sheet15.xml") but in my example file it is:
and there is no "xl/worksheets/sheet15.xml" in XLSX arch (zip file ;) )
So even it could works somehow it will match wrong sheet name to his content ...
Anyway, with my changes:
1) file generated by PHPExcel
You can see that there id on LEFT side is the same digit in sheet%{X}.xml name ...
./DDD_xlsx2csv.py -a -d ":" GENEROWANY1.xlsx /tmp/adasd
sheetid: 4 , name of xml: xl/worksheets/sheet1.xml
sheetid: 5 , name of xml: xl/worksheets/sheet2.xml
sheetid: 6 , name of xml: xl/worksheets/sheet3.xml
sheetid: 7 , name of xml: xl/worksheets/sheet4.xml
sheetid: 8 , name of xml: xl/worksheets/sheet5.xml
sheetid: 9 , name of xml: xl/worksheets/sheet6.xml
sheetid: 10 , name of xml: xl/worksheets/sheet7.xml
sheetid: 11 , name of xml: xl/worksheets/sheet8.xml
sheetid: 12 , name of xml: xl/worksheets/sheet9.xml
sheetid: 13 , name of xml: xl/worksheets/sheet10.xml
sheetid: 14 , name of xml: xl/worksheets/sheet11.xml
sheetid: 15 , name of xml: xl/worksheets/sheet12.xml
sheetid: 16 , name of xml: xl/worksheets/sheet13.xml
sheetid: 17 , name of xml: xl/worksheets/sheet14.xml
2) XLSM file from Excel:
./DDD_xlsx2csv.py -a -d ":" SAK.xlsm /tmp/adasd
sheetid: 1 , name of xml: xl/worksheets/sheet1.xml
sheetid: 2 , name of xml: xl/worksheets/sheet2.xml
sheetid: 3 , name of xml: xl/worksheets/sheet3.xml
sheetid: 4 , name of xml: xl/worksheets/sheet4.xml
sheetid: 5 , name of xml: xl/worksheets/sheet5.xml
sheetid: 6 , name of xml: xl/worksheets/sheet6.xml
sheetid: 7 , name of xml: xl/worksheets/sheet7.xml
sheetid: 8 , name of xml: xl/worksheets/sheet8.xml
sheetid: 9 , name of xml: xl/worksheets/sheet9.xml
sheetid: 10 , name of xml: xl/worksheets/sheet10.xml
sheetid: 11 , name of xml: xl/worksheets/sheet11.xml
sheetid: 12 , name of xml: xl/worksheets/sheet12.xml
sheetid: 13 , name of xml: xl/worksheets/sheet13.xml
3) file generated by JAVA software from some application (is not xl style)
./DDD_xlsx2csv.py -a -d ":" gsw_CMDB_1465822227.xlsx /tmp/adasd
sheetid: 21 , name of xml: xl/worksheets/sheet21.xml
sheetid: 51 , name of xml: xl/worksheets/sheet51.xml
4) generated file from Excel 2013
./DDD_xlsx2csv.py -a -d ";" Book1.xlsx /tmp/Book1
sheetid: 1 , name of xml: xl/worksheets/sheet1.xml
sheetid: 2 , name of xml: xl/worksheets/sheet2.xml
sheetid: 3 , name of xml: xl/worksheets/sheet3.xml
sheetid: 4 , name of xml: xl/worksheets/sheet4.xml
sheetid: 5 , name of xml: xl/worksheets/sheet5.xml
5) file generated from phpexcel
./DDD_xlsx2csv.py -a -d ";" phpexcel.xlsx /tmp/php
self.workbook.appName: xl
sheetid: 4 , name of xml: xl/worksheets/sheet1.xml
self.workbook.appName: xl
sheetid: 5 , name of xml: xl/worksheets/sheet2.xml
Last check ...
./DDD_xlsx2csv.py -a -d ";" Book1.xlsx /tmp/php
SheetName: Test1 , sheetId: 1 , relationID/r:id: rId1 , short rId: 1
SheetName: What3 , sheetId: 2 , relationID/r:id: rId2 , short rId: 2
SheetName: Noo123 , sheetId: 3 , relationID/r:id: rId3 , short rId: 3
SheetName: Rename21 , sheetId: 4 , relationID/r:id: rId4 , short rId: 4
SheetName: Names01 , sheetId: 5 , relationID/r:id: rId5 , short rId: 5
id passed to _convert: 1 , xml file: xl/worksheets/sheet1.xml
id passed to _convert: 2 , xml file: xl/worksheets/sheet2.xml
id passed to _convert: 3 , xml file: xl/worksheets/sheet3.xml
id passed to _convert: 4 , xml file: xl/worksheets/sheet4.xml
id passed to _convert: 5 , xml file: xl/worksheets/sheet5.xml
[mabramowicz@oc8656482817 DIR.8]$ ./DDD_xlsx2csv.py -a -d ";" phpexcel.xlsx /tmp/php
SheetName: Worksheet , sheetId: 1 , relationID/r:id: rId4 , short rId: 4
SheetName: Worksheet 1 , sheetId: 2 , relationID/r:id: rId5 , short rId: 5
id passed to _convert: 4 , xml file: xl/worksheets/sheet1.xml
id passed to _convert: 5 , xml file: xl/worksheets/sheet2.xml
It looks like it works like I want.
Prepare dictionary with xml name and rId value, then use it...
Hi!
--- DDD_xlsx2csv.py 2016-06-13 17:01:01.676067849 +0200 +++ orig_xlsx2csv.py 2016-06-13 17:01:01.721067828 +0200 @@ -185,8 +185,7 @@ raise OutFileAlreadyExistsException("File " + str(outfile) + " already exists!") for s in self.workbook.sheets: sheetname = s['name']
ABRAM -> print sheetname and coresponding sheetId
+
filter sheets by include pattern
@@ -222,26 +221,13 @@ self._convert(s['id'], of)
def _convert(self, sheetid, outfile):
ABRAM -> name of xml file form XLSX arch
ABRAM -> check if appANme = xl then do something other then oryginal script
ABRAM -> there is generated correct name of xml file with rId%i value
ABRAM -> std way of this script
ABRAM -> OLD value sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid)
ABRAM -> new sheetfile where RIGHT xmlfile is set
ABRAM -> print xmlfile name with ID which was passed to this function
ABRAM -> prrint line with SheetName, sheetId, relationID
ABRAM -> there variable id with value is generated
ABRAM -> but it could be some kind of missmatch on them
ABRAM -> there id is from "r:id' -> rId[123] or from sheetId -> sheetId="123"
class Relationships: def init(self): self.relationships = {}
ABRAM -> dictionary to hold names of xml files with related rId value
def parse(self, filehandle): doc = minidom.parseString(filehandle.read()) @@ -355,10 +333,6 @@ if rId: vtype = attrs.get('Type') target = attrs.get('Target')
ABRAM -> let store rId value and name of xml file in dictionary
ABRAM -> disabled print
print 'rId: ', rId.value, ', XML: ', target.value
Email was sent. Now some info:
XLSX file
Content of xl/workbook.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Content of xl/_rels/workbook.xml.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Problem with oryginal script is how to get ID of sheet, for sheet name. Example of use:
./orig_xlsx2csv.py -a -d ";" GENEROWANY.xlsx /tmp/XXXXX Sheet 15 not found
Which means that 'id' 15 goes to "def _convert(self, sheetid, outfile):" where name is created like that: "sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid)"
What kind of ID is prepared? Please refer to Class Workbook, there is lines:
What I know this is like, appName = XL, then lenght of value > 2, so it will always passed to: id = int(attrs["r:id"].value[3:]) I do not know what to think about it, but this check will always go to one direction to made id = int(attrs["r:id"].value[3:])
This part of script can be used only when DIGITS on sheetId=XX, rIdXX and Target="worksheets/sheetXX.xml are same. I mean sheetId=8 -> r:id=rId8 -> Target="worksheets/sheet8.xml" only then it works like exepcted.
When XLSX was generated by PHPExcel libs situation is different then expected. Because sheetId can has other NUMBER then r:id or sheet name.
It looks like name of xml file should be taken by rId from file "xl/_rels/workbook.xml.rels"
When XLSX file is fine (accepted by Openoffice or Excel2013): ./DDD_xlsx2csv.py -a -d ":" GENEROWANY1.xlsx /tmp/adasd rId: rId1 , XML: styles.xml rId: rId2 , XML: theme/theme1.xml rId: rId3 , XML: sharedStrings.xml rId: rId4 , XML: worksheets/sheet1.xml rId: rId5 , XML: worksheets/sheet2.xml rId: rId6 , XML: worksheets/sheet3.xml rId: rId7 , XML: worksheets/sheet4.xml rId: rId8 , XML: worksheets/sheet5.xml rId: rId9 , XML: worksheets/sheet6.xml rId: rId10 , XML: worksheets/sheet7.xml rId: rId11 , XML: worksheets/sheet8.xml rId: rId12 , XML: worksheets/sheet9.xml rId: rId13 , XML: worksheets/sheet10.xml rId: rId14 , XML: worksheets/sheet11.xml rId: rId15 , XML: worksheets/sheet12.xml rId: rId16 , XML: worksheets/sheet13.xml rId: rId17 , XML: worksheets/sheet14.xml Sheet 15 not found
Which means value 15 was passed to: def _convert(self, sheetid, outfile): Then it failed because: sheetfile = self._filehandle("xl/worksheets/sheet%i.xml" % sheetid) -> sheetfile = self._filehandle("xl/worksheets/sheet15.xml") but in my example file it is:
and there is no "xl/worksheets/sheet15.xml" in XLSX arch (zip file ;) )
So even it could works somehow it will match wrong sheet name to his content ...
Anyway, with my changes: 1) file generated by PHPExcel You can see that there id on LEFT side is the same digit in sheet%{X}.xml name ...
./DDD_xlsx2csv.py -a -d ":" GENEROWANY1.xlsx /tmp/adasd sheetid: 4 , name of xml: xl/worksheets/sheet1.xml sheetid: 5 , name of xml: xl/worksheets/sheet2.xml sheetid: 6 , name of xml: xl/worksheets/sheet3.xml sheetid: 7 , name of xml: xl/worksheets/sheet4.xml sheetid: 8 , name of xml: xl/worksheets/sheet5.xml sheetid: 9 , name of xml: xl/worksheets/sheet6.xml sheetid: 10 , name of xml: xl/worksheets/sheet7.xml sheetid: 11 , name of xml: xl/worksheets/sheet8.xml sheetid: 12 , name of xml: xl/worksheets/sheet9.xml sheetid: 13 , name of xml: xl/worksheets/sheet10.xml sheetid: 14 , name of xml: xl/worksheets/sheet11.xml sheetid: 15 , name of xml: xl/worksheets/sheet12.xml sheetid: 16 , name of xml: xl/worksheets/sheet13.xml sheetid: 17 , name of xml: xl/worksheets/sheet14.xml
2) XLSM file from Excel: ./DDD_xlsx2csv.py -a -d ":" SAK.xlsm /tmp/adasd sheetid: 1 , name of xml: xl/worksheets/sheet1.xml sheetid: 2 , name of xml: xl/worksheets/sheet2.xml sheetid: 3 , name of xml: xl/worksheets/sheet3.xml sheetid: 4 , name of xml: xl/worksheets/sheet4.xml sheetid: 5 , name of xml: xl/worksheets/sheet5.xml sheetid: 6 , name of xml: xl/worksheets/sheet6.xml sheetid: 7 , name of xml: xl/worksheets/sheet7.xml sheetid: 8 , name of xml: xl/worksheets/sheet8.xml sheetid: 9 , name of xml: xl/worksheets/sheet9.xml sheetid: 10 , name of xml: xl/worksheets/sheet10.xml sheetid: 11 , name of xml: xl/worksheets/sheet11.xml sheetid: 12 , name of xml: xl/worksheets/sheet12.xml sheetid: 13 , name of xml: xl/worksheets/sheet13.xml
3) file generated by JAVA software from some application (is not xl style) ./DDD_xlsx2csv.py -a -d ":" gsw_CMDB_1465822227.xlsx /tmp/adasd sheetid: 21 , name of xml: xl/worksheets/sheet21.xml sheetid: 51 , name of xml: xl/worksheets/sheet51.xml
4) generated file from Excel 2013 ./DDD_xlsx2csv.py -a -d ";" Book1.xlsx /tmp/Book1 sheetid: 1 , name of xml: xl/worksheets/sheet1.xml sheetid: 2 , name of xml: xl/worksheets/sheet2.xml sheetid: 3 , name of xml: xl/worksheets/sheet3.xml sheetid: 4 , name of xml: xl/worksheets/sheet4.xml sheetid: 5 , name of xml: xl/worksheets/sheet5.xml
5) file generated from phpexcel ./DDD_xlsx2csv.py -a -d ";" phpexcel.xlsx /tmp/php self.workbook.appName: xl sheetid: 4 , name of xml: xl/worksheets/sheet1.xml self.workbook.appName: xl sheetid: 5 , name of xml: xl/worksheets/sheet2.xml
Last check ... ./DDD_xlsx2csv.py -a -d ";" Book1.xlsx /tmp/php SheetName: Test1 , sheetId: 1 , relationID/r:id: rId1 , short rId: 1 SheetName: What3 , sheetId: 2 , relationID/r:id: rId2 , short rId: 2 SheetName: Noo123 , sheetId: 3 , relationID/r:id: rId3 , short rId: 3 SheetName: Rename21 , sheetId: 4 , relationID/r:id: rId4 , short rId: 4 SheetName: Names01 , sheetId: 5 , relationID/r:id: rId5 , short rId: 5 id passed to _convert: 1 , xml file: xl/worksheets/sheet1.xml id passed to _convert: 2 , xml file: xl/worksheets/sheet2.xml id passed to _convert: 3 , xml file: xl/worksheets/sheet3.xml id passed to _convert: 4 , xml file: xl/worksheets/sheet4.xml id passed to _convert: 5 , xml file: xl/worksheets/sheet5.xml [mabramowicz@oc8656482817 DIR.8]$ ./DDD_xlsx2csv.py -a -d ";" phpexcel.xlsx /tmp/php SheetName: Worksheet , sheetId: 1 , relationID/r:id: rId4 , short rId: 4 SheetName: Worksheet 1 , sheetId: 2 , relationID/r:id: rId5 , short rId: 5 id passed to _convert: 4 , xml file: xl/worksheets/sheet1.xml id passed to _convert: 5 , xml file: xl/worksheets/sheet2.xml
It looks like it works like I want.
Prepare dictionary with xml name and rId value, then use it...
I know, I am not pro in python, but it works ...
Maybe you can check yours point of view???