catamphetamine / read-excel-file

Read *.xlsx files in a browser or Node.js. Parse to JSON with a strict schema.
https://catamphetamine.gitlab.io/read-excel-file/
MIT License
301 stars 52 forks source link

Skips empty columns to the left #102

Closed Gengisca closed 3 years ago

Gengisca commented 3 years ago

I've an xlsx EMPTY_COLUMN.xlsx with two columns, first is always empty, and the second is not empty. The response is this one: 0: ['Rome'] 1: ['Madrid'] 2: ['London'] 3: ['Istanbul'] 4: ['Paris'] 5: ['Amsterdam']

This behavior is incorrect as information from the first column is lost. I would like to have an answer like this: 0: (2) [null, 'Rome'] 1: (2) [null, 'Madrid'] 2: (2) [null, 'London'] 3: (2) [null, 'Istanbul'] 4: (2) [null, 'Paris'] 5: (2) [null, 'Amsterdam']

catamphetamine commented 3 years ago

See if you're using the latest version.

Gengisca commented 3 years ago

Hello, I'm using this CDN file: https://unpkg.com/read-excel-file@5.x/bundle/read-excel-file.min.js

Giancarlo

catamphetamine commented 3 years ago

Then provide a demo somewhere on codesandbox.io and an example file

Gengisca commented 3 years ago

You can use https://catamphetamine.github.io/read-excel-file/ with this file: EMPTY_COLUMN.xlsx

catamphetamine commented 3 years ago

I see.

[
  [
    "Rome"
  ],
  [
    "Madrid"
  ],
  [
    "London"
  ],
  [
    "Istanbul"
  ],
  [
    "Paris"
  ],
  [
    "Amsterdam"
  ]
]
catamphetamine commented 3 years ago

I've unzipped the Excel file and checked out the xl/worksheets/sheet1.xml file.

<?xml version="1.0" encoding="UTF-8"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" mc:Ignorable="x14ac xr xr2 xr3" xr:uid="{00000000-0001-0000-0000-000000000000}">
   <dimension ref="B1:B16" />
   <sheetViews>
      <sheetView tabSelected="1" workbookViewId="0" />
   </sheetViews>
   <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25" />
   <cols>
      <col min="1" max="1" width="54.42578125" customWidth="1" />
      <col min="2" max="2" width="42.140625" customWidth="1" />
   </cols>
   <sheetData>
      <row r="1" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B1" t="s">
            <v>0</v>
         </c>
      </row>
      <row r="2" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B2" t="s">
            <v>1</v>
         </c>
      </row>
      <row r="3" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B3" t="s">
            <v>2</v>
         </c>
      </row>
      <row r="4" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B4" t="s">
            <v>3</v>
         </c>
      </row>
      <row r="5" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B5" t="s">
            <v>4</v>
         </c>
      </row>
      <row r="6" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B6" t="s">
            <v>5</v>
         </c>
      </row>
      <row r="16" spans="2:2" ht="19.5" customHeight="1" x14ac:dyDescent="0.25" />
   </sheetData>
   <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />
</worksheet>

There, one can see it has <dimension ref="B1:B16"/>. What does it mean? Most likely that the sheet spans from column B1 to column B16.

What should the library do in that case? It could "pad" the data with the missing columns A1...dimensions[0]. But would that be what the creators of the file format intended? Perhaps that <dimension ref="B1:B16"/> element is there to instruct the program that all other columns should be ignored.

So I won't consider this as a bug. It could be a feature-flag, something like ignoreEmptyColumns: false, but I don't want to add such a feature in the library for now. Anyway, the ticket can stay open.

Gengisca commented 3 years ago

Hi, I've found this doc page: https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.sheetdimension?view=openxml-2.8.1 In effect Dimension specifies the used range of the worksheet. But... it is optional, and for sure Dimension doesn't instruct the program that all other columns should be ignored.

In my opinion it is really important to have the worksheet in the browser exactly as the Excel one. Adding 'ignoreEmptyColumns: false' is the solution!

In the case you haven't time or don't want to add this feature, can you tell me which modules I have to change?

Thank you Giancarlo

PS I have already forked, but it was an error...

catamphetamine commented 3 years ago

@Gengisca Hmm, nice digging. So, it's really something of an advice for the spreadsheet viewing software. They don't provide any incentives on what problems would it solve. Maybe it's an optimization thing or something. Perhaps this way, the spreadsheet viewer could scroll to the right when opening such file so that B is the first visible column. I'll fix then.

catamphetamine commented 3 years ago

published read-excel-file@5.2.25

Gengisca commented 3 years ago

Hello, Without words ... very fast !!! Thank you very much! A big greeting from Italy Giancarlo