TheDataShed / xlsxreader

A low-memory high performance library for reading data from an xlsx file
https://godoc.org/github.com/TheDataShed/xlsxreader
MIT License
74 stars 21 forks source link

cannot read some spreadsheet cells #40

Closed jsd3 closed 5 months ago

jsd3 commented 3 years ago

I have a spreadsheet (attached) that has 81 columns, yet some of the rows read only 79 or 80 columns . I have tried Excelize and it also has the same trouble with two of the spreadsheet columns when the cells are empty (see the columns "Customer request date" and "Target date" and the last two rows in the attached spreadsheet).

Here is some code for reading the file and seeing the problem:

xl, err := xlsxreader.OpenFile("")

if err != nil {
    panic(err)
}
// Ensure the file reader is closed once utilised
defer xl.Close()

rowcount := 0
cellcount := 0
columncount := 0
brokencount := 0

// Iterate on the rows of data
for row := range xl.ReadRows("Sheet1") {
    rowcount++

    record := make([]string, 0, len(row.Cells))

    for _, cell := range row.Cells {
        cellcount++
        record = append(record, cell.Value)
        if rowcount == 1 {
            columncount++
            fmt.Println("found cell header value:", cell.Value)
        }
    }
    if len(record) != columncount {
        brokencount++
        fmt.Println(rowcount, ": found row that has count different from columncount (", len(record), "vs", columncount, ")")
     }
}

fmt.Println("The input file", *infile, "is", size, "bytes long")
fmt.Println("found", cellcount, "cells in", rowcount, "rows. There are", columncount, "columns")
fmt.Println("There are ", brokencount, "broken rows!")

The xml in the file appears ok from manual examination. As an aside, I also tried parsing with NodeJS and one library fails similarly, but another works just fine.

It is a bit of a puzzle and I'd love to see a fix or explanation for the problem! If it can be solved that would make this module absolutely amazing and a critical piece of functionality I have to deliver.

sample_parse_failure.xlsx

Thanks so much for any help!

dglsparsons commented 2 years ago

Hi @jsd3. Thanks for opening this issue.

It's a while since I've worked on this, but I think the issue could be that the cells you get back in each row don't necessarily include empty cells - they instead rely on you working out which Column it belongs to.

type Cell struct {
    Column string // E.G   A, B, C
    Row    int
    Value  string
    Type   CellType
}

There is a ColumnIndex helper if you're looking for the column number though:

// ColumnIndex gives a number, representing the column the cell lies beneath.
func (c Cell) ColumnIndex() int {
    return asIndex(c.Column)
}

Hope this helps.