catamphetamine / read-excel-file

Read *.xlsx files in a browser or Node.js. Parse to JSON with a strict schema.
https://catamphetamine.gitlab.io/read-excel-file/
MIT License
301 stars 52 forks source link

If the Excel file has a sheet with name Overview then it throws error #92

Closed Prashant-Kan closed 3 years ago

Prashant-Kan commented 3 years ago

If the execel file has a sheet (try with Only one sheet) with name Overview and when we try to call

const readExcelFile = require('read-excel-file/node');
readExcelFile("filename");

It throws d error

cy.task('ReadExcelFile') failed with the following error:

>  TypeError: Cannot read property 'trim' of undefined
      at Cell (XXXXXXXXX\node_modules\read-excel-file\source\readXlsx.js:207:40)
      at map (XXXXXXXXX\node_modules\read-excel-file\source\readXlsx.js:310:104)
    at Array.map (<anonymous>)
      at parseSheet (XXXXXXXXX\node_modules\read-excel-file\source\readXlsx.js:310:92)
      at readXlsx (XXXXXXXXX\node_modules\read-excel-file\source\readXlsx.js:68:18)
      at readXlsxFileContents (XXXXXXXXX\node_modules\read-excel-file\source\readXlsxFileContents.js:9:18)
      at <unknown> (XXXXXXXXX\node_modules\read-excel-file\source\readXlsxFileNode.js:14:23)

If we rename the sheet then its working fine with the same xlsx file. But, we are looking for the code which also accepts the Overview Sheet

catamphetamine commented 3 years ago

The line it refers to is:

    // If the cell contains a "shared" string.
    case 's':
      // If a cell has no value then there's no `<c/>` element for it.
      // If a `<c/>` element exists then it's not empty.
      // The `<v/>`alue is a key in the "shared strings" dictionary of the
      // XLSX file, so look it up in the `values` dictionary by the numeric key.
      value = values[parseInt(value)].trim()
      if (value === '') {
        value = undefined
      }
      break

Looks like it doesn't find the string value in the index of string values.

It's unclear what the issue is. You can debug it yourself by modifying the code in node_modules/read-excel-file and then running the app.

catamphetamine commented 3 years ago

Or, provide the excel file that is not working (test on the demo whether it works first).

Prashant-Kan commented 3 years ago

I want to debug but the file path given was not found in my node_modules. May be it was generated runtime?

Could you guide how to debug in this sceanrio?

catamphetamine commented 3 years ago

Could you guide how to debug in this sceanrio?

Open node_modules/read-excel-file/modules/readXlsx.js or node_modules/read-excel-file/commonjs/readXlsx.js, and add console.log('@@@@@@@@@@ test') to that file. Reload the app (maybe even stop it and re-run it). If it prints to the console, then you can debug that file.

Alternatively, provide the Excel file here.

Prashant-Kan commented 3 years ago

PhraseEntityList.xlsx This is the file.

I have already tried keeping the console.log and alert in the /read-excel-file/commonjs/readXlsx.js but its not working.

In the error the path is different and when I tried to find the file from the given path in error. It doesn't exist

Prashant-Kan commented 3 years ago

What I found is that If i do some changes in the file and save and then tried to read it works.

But the file is dynamic, its been downloaded at runtime and then read the content. So, need to find some different approach to verify

Prashant-Kan commented 3 years ago

Hi @catamphetamine

Thx for the suggestion to change and restart the app Now, the console.log is working. I have starting debugging. Will check what can be the problem here.

If you find anything on the above attached excel then do let me know.

Thanks again, appreciate the prompt reply of the issue

catamphetamine commented 3 years ago

Well, I did a brief debug of the program.

In readXlsxFile.js:


function Cell(cellNode, sheet, xml, values, styles, properties, options) {
  const coords = CellCoords(cellNode.getAttribute('r'))

  console.log('#### sheet', sheet)
  console.log('#### cellNode', cellNode)

  let value = xml.select(sheet, cellNode, 'a:v', namespaces)[0]

  console.log('#### value', value, typeof value)

  // For `xpath` `value` can be `undefined` while for native `DOMParser` it's `null`.
  // So using `value && ...` instead of `if (value !== undefined) { ... }` here.
  value = value && value.textContent

  switch (cellNode.getAttribute('t')) {
    // If the cell contains a "shared" string.
    case 's':
      console.log('values', values)
      console.log('value', value)
      // If a cell has no value then there's no `<c/>` element for it.
      // If a `<c/>` element exists then it's not empty.
      // The `<v/>`alue is a key in the "shared strings" dictionary of the
      // XLSX file, so look it up in the `values` dictionary by the numeric key.
      value = values[parseInt(value)].trim()
      if (value === '') {
        value = undefined
      }
      break

The value was undefined for some reason, because cellNode didn't have any childNodes.

Why was that? No one knows. Perhaps, it's a corrupted file.

If you want, you can debug it further: clone the project, run npm install, then npm build, then place the attached index.html to bundle folder and open bundle/index.html in the browser.

index.zip

But, perhaps it would be simpler for you to look for another library.

catamphetamine commented 3 years ago

I'm closing this issue as "won't fix" becase it won't be fixed.

Prashant-Kan commented 3 years ago

Hi, @catamphetamine

I found the problem. The file is not corrupt.

The problem i think from my debugging (investigation). The value of Description column is blank and there the cellnode would be undefined and also there are two "False" as a column value in the sheet

I console.log(values) and found that the description value and False both are missing in the array.

THis Index Value
0
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
Phrase
THis Index Value
1
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
Name
THis Index Value
2
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
Description
THis Index Value
3
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
SystemPhrase
THis Index Value
4
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
ShowValue
THis Index Value
5
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
TimeStamp
THis Index Value
6
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
OQ-0136-28-Phrase
THis Index Value
undefined
THis is  testing array 
[
  'Phrase',
  'Name',
  'Description',
  'SystemPhrase',
  'ShowValue',
  'TimeStamp',
  'OQ-0136-28-Phrase',
  'False',
  'AAAAAAADNLQ='
]
THis is Value from the index is
undefined
catamphetamine commented 3 years ago

Yes, this is the same output I got. For some reason, the "value from the index" is undefined. This is the first time I've seen something like this. That's why I said that the file may be corrupt.

I could add a special case for this, but it is possible that this is not valid for a cell to have undefined textContent in this case, and we shouldn't handle invalid cases in this library.

If you want, you can patch the code to handle this case via something like if (undefined) then .... You can publish the patched package as your own, or you could import it in an application directly as the source files.

Prashant-Kan commented 3 years ago

Ok I am looking for something

but one strange thing we found is that there is only False instead of 2. that means the values (arrays) has distinct value. If there multiple cells having same value it will saw only one time.

catamphetamine commented 3 years ago

If there're "False" values then maybe Excel didn't recognize it as a BOOL type and instead wrote it in the file as string type.

Prashant-Kan commented 3 years ago

image

here, not talking about bool or string. Here the value is seperate but when reading the sheet.values variable has only one 'False'

catamphetamine commented 3 years ago

This is called "shared strings". https://stackoverflow.com/questions/17337073/shared-strings-in-excel-2010 They compress *.xlsx files like this

catamphetamine commented 3 years ago

https://docs.microsoft.com/en-us/office/open-xml/working-with-the-shared-string-table

Shared strings optimize space requirements when the spreadsheet contains multiple instances of the same string. Spreadsheets that contain business or analytical data often contain repeating strings. If these strings were stored using inline string markup, the same markup would appear over and over in the worksheet. While this is a valid approach, there are several downsides. First, the file requires more space on disk because of the redundant content. Moreover, loading and saving also takes longer.

Prashant-Kan commented 3 years ago

Hi, found the exact issue here. There's nothing related to the sheet name. It was my wrong conclusion.

The problem with error of trim of undefined is related to the empty cell value for particular column.

Say, For Eg we have 5 columns in the table/grid viz Name,Description,Status,Date,Salary. which has 5-10 rows or sometime may be 1 row. but the description field is blank which has no value. in that scenario its breaking with the above mentioned error. if I remove that description column then it will work fine.

But in our application we have implemented in various tables/grid and in most of them number of columns are more than 10-15 of which 50% can have blank or no (null) values in that scenario its breaking

Can we get a work around for that?

NOTE: If we open the same file and without even changing anything save the file and then try to read that file then it will work with the blank value.

It would be really helpful for us if that can be fixed

catamphetamine commented 3 years ago

but the description field is blank which has no value. in that scenario its breaking with the above mentioned error.

This library is designed to work fine with empty cells. In your particular case, it looks like an incorrectly generated *.xlsx file.

Download the file you've provided above. Unzip it. Go to ./xl/worksheets, open sheet.xml.

<?xml version="1.0" encoding="utf-8"?>
<x:worksheet
    xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <x:sheetData>
        <x:row r="1">
            <x:c r="A1" t="s">
                <x:v>0</x:v>
            </x:c>
        </x:row>
        <x:row r="2">
            <x:c r="A2" t="s">
                <x:v>1</x:v>
            </x:c>
            <x:c r="B2" t="s">
                <x:v>2</x:v>
            </x:c>
            <x:c r="C2" t="s">
                <x:v>3</x:v>
            </x:c>
            <x:c r="D2" t="s">
                <x:v>4</x:v>
            </x:c>
            <x:c r="E2" t="s">
                <x:v>5</x:v>
            </x:c>
        </x:row>
        <x:row r="3">
            <x:c r="A3" t="s">
                <x:v>6</x:v>
            </x:c>
            <x:c r="B3" t="s" />
            <x:c r="C3" t="s">
                <x:v>7</x:v>
            </x:c>
            <x:c r="D3" t="s">
                <x:v>7</x:v>
            </x:c>
            <x:c r="E3" t="s">
                <x:v>8</x:v>
            </x:c>
        </x:row>
    </x:sheetData>
</x:worksheet>

See <x:c r="B3" t="s" />. t="s" means the cell's value is a "shared string". But, it doesn't specify the index of the shared string. That's why the file is not generated correctly.

catamphetamine commented 3 years ago

NOTE: If we open the same file and without even changing anything save the file and then try to read that file then it will work with the blank value.

That's because the file wasn't generated correctly.