Closed siddhant2408 closed 4 years ago
The paths you provided as an example do not match, I'm also not quite sure on what you are driving at? Can you provide some replication steps/ improved context please?
Sample File test3.xlsx
The library is unable to open this file and the error is
copy excel to csv: open file: File not found: xl//xl/worksheets/sheet1.xml
Can you please take a look
I think there is an issue here https://github.com/TheDataShed/xlsxreader/blob/f13424b56c2d13078ef168ff38fbc0ff53d2fa60/sheets.go#L33
return "xl/" + rel.Target, nil
Maybe rel.Target is already giving us the the file name with xl/
appended. WDYT
Any ideas?
Could you add a sample file that replicates the issue? I'm not seeing it when playing around, but it could be something specific to the file you have?
From your original post:
We're using the relationship file to make the name of the spreadsheet file. Can we not just get the names of the spreadsheets that are present in the root directory of the xml
That's because listing the files that are the contents kind of breaks the concept of treating an xlsx file as a 'black box'. The relationships file is used to translate the user facing sheet name, into the physical file name on disk. i.e. from these.
Hope that makes sense?
I added the sample file in my comment above https://github.com/TheDataShed/xlsxreader/issues/22#issuecomment-684306714
I can't get the same issue with that file you've attached, so I'm not sure how you're causing it?
I'm running the following code:
e, err := xlsxreader.OpenFile("./test3.xlsx")
if err != nil {
fmt.Printf("error: %s \n", err)
return
}
defer e.Close()
fmt.Printf("Worksheets: %s \n", e.Sheets)
for row := range e.ReadRows("Aziende") {
if row.Error != nil {
fmt.Printf("error on row %d: %s \n", row.Index, row.Error)
return
}
}
Produces the following output:
$ go run main.go
Worksheets: [Aziende]
error on row 1: Unable to get cell value for cell A1 - no value element found
I'm also not sure how you've produced that spreadsheet? Because the contents appear to be completely invalid. i.e.
<row r="1">
<c r="A1" s="1" t="abcdefGhi">
<is>
<t>JKL Mnopq</t>
</is>
</c>
<c r="B1" s="1" t="rstuvwXyz">
<is>
<t>Abcdef ghijklm</t>
</is>
</c>
The t
field on a c
element (type of a cell) appears to be complete garbage.
Yes, we removed the unncessary things from the file. But it got corrupted in the process. Can you please try this file?
So again, following code works:
func main() {
e, err := xlsxreader.OpenFile("./test2_onlyheader.xlsx")
if err != nil {
fmt.Printf("error: %s \n", err)
return
}
defer e.Close()
fmt.Printf("Worksheets: %s \n", e.Sheets)
for row := range e.ReadRows("Aziende") {
if row.Error != nil {
fmt.Printf("error on row %d: %s \n", row.Index, row.Error)
return
}
if row.Index < 10 {
fmt.Printf("%+v \n", row.Cells)
}
}
}
output:
Worksheets: [Aziende]
[{Column:A Row:1 Value:URL Atoka} {Column:B Row:1 Value:Codice fiscale} {Column:C Row:1 Value:Partita IVA} {Column:D Row:1 Value:Numero REA} {Column:E Row:1 Value:Provincia REA} {Column:F Row:1 Value:Ragione sociale} {Column:G Row:1 Value:Natura giuridica} {Column:H Row:1 Value:Sul mercato da} {Column:I Row:1 Value:Startup innovativa} {Column:J Row:1 Value:Operativa} {Column:K Row:1 Value:Codice ATECO 2007} {Column:L Row:1 Value:Descrizione ATECO 2007} {Column:M Row:1 Value:Codice NACE} {Column:N Row:1 Value:Descrizione NACE} {Column:O Row:1 Value:Indirizzo completo (sede legale)} {Column:P Row:1 Value:C.A.P. (sede legale)} {Column:Q Row:1 Value:Comune (sede legale)} {Column:R Row:1 Value:Provincia (sede legale)} {Column:S Row:1 Value:Regione (sede legale)} {Column:T Row:1 Value:Area geografica (sede legale)} {Column:U Row:1 Value:Numero unità locali (oltre alla sede legale)} {Column:V Row:1 Value:Altre unità locali} {Column:W Row:1 Value:Telefoni verificati} {Column:X Row:1 Value:Telefoni} {Column:Y Row:1 Value:Email verificate (info)} {Column:Z Row:1 Value:Email (info)} {Column:AA Row:1 Value:Email verificate (vendite)} {Column:AB Row:1 Value:Email (vendite)} {Column:AC Row:1 Value:Email verificate (marketing)} {Column:AD Row:1 Value:Email (marketing)} {Column:AE Row:1 Value:Email verificate (acquisti)} {Column:AF Row:1 Value:Email (acquisti)} {Column:AG Row:1 Value:Email verificate (direzione)} {Column:AH Row:1 Value:Email (direzione)} {Column:AI Row:1 Value:Email verificate (assistenza)} {Column:AJ Row:1 Value:Email (assistenza)} {Column:AK Row:1 Value:Email verificate (tecnico)} {Column:AL Row:1 Value:Email (tecnico)} {Column:AM Row:1 Value:Email verificate (amministrazione)} {Column:AN Row:1 Value:Email (amministrazione)} {Column:AO Row:1 Value:Email verificate (altre)} {Column:AP Row:1 Value:Email (altre)} {Column:AQ Row:1 Value:Website} {Column:AR Row:1 Value:Numero dipendenti} {Column:AS Row:1 Value:Ricavi (€)} {Column:AT Row:1 Value:Data chiusura bilancio} {Column:AU Row:1 Value:Quotata} {Column:AV Row:1 Value:Dettagli persone} {Column:AW Row:1 Value:Primo esponente} {Column:AX Row:1 Value:Ruolo primo esponente} {Column:AY Row:1 Value:Secondo esponente} {Column:AZ Row:1 Value:Ruolo secondo esponente} {Column:BA Row:1 Value:Terzo esponente} {Column:BB Row:1 Value:Ruolo terzo esponente} {Column:BC Row:1 Value:Faxes} {Column:BD Row:1 Value:Pec}]
Are you sure you tested the exact file without opening and saving the file again. That fixes the file
Hello,
I made this test on a fork here https://github.com/emilien-puget/xlsxreader/blob/b9ae9374ebce9752f70bc248f51c5f7074951024/file_test.go#L36, and this test results in a panic
Considering opening and saving the file fixes the issue, and i don't think having xl/ as a prefix in the relationships is covered in the xlsx specification -> https://www.ecma-international.org/publications/standards/Ecma-376.htm
It looks like the xlsx file is simply corrupt? How has this file been created?
Maybe the other solution would just be that the path provided in the relationships file could be ambiguous, either as a partial (starting after xl/
), or starting with xl/
, and the library should simply decide whether to add the prefix or not. It's currently unclear to me if it's just that one file that isn't formed properly or a wider problem though.
This isn't an issue just with that one file, it's happening at scale (i have no idea how these xlsx are created), your idea about checking whether the file already contains the start of the path may be a solution
Okay, i'll look at preparing a fix tomorrow.
This should be resolved in the latest release now (#23 closes). Let us know if the issue persists :).
@AndyThurgood can you close this issue?
Closing...
Error:
We're using the relationship file to make the name of the spreadsheet file. Can we not just get the names of the spreadsheets that are present in the root directory of the xml
For the above file, the file listing is:
As we can see,
xl/worksheets/sheet1.xml
is already present in the listing