dtjohnson / xlsx-populate

Excel XLSX parser/generator written in JavaScript with Node.js and browser support, jQuery/d3-style method chaining, encryption, and a focus on keeping existing workbook features and styles in tact.
MIT License
947 stars 180 forks source link

Cannot read property 'children' of null when read .xlsx file #314

Open KarmitoHerry opened 3 years ago

KarmitoHerry commented 3 years ago

Hi all,

I have a issue when my code read file with filetype .xlsx but the error code : TypeError: Cannot read property 'children' of undefined. I'm using version on my package.json "xlsx-populate": "^1.21.0"

async function readFileExcel(file_name, pathFile, outlet) {

    var file = pathFile + file_name;
    console.log("File :" + file);

    await XlsxPopulate.fromFileAsync(file).then(workbook => {
        console.log(workbook);
    });
}

with error detail

TypeError: Cannot read property 'children' of undefined
    at Cell._parseNode (/node_modules/xlsx-populate/lib/Cell.js:608:24)
    at Cell._init (/node_modules/xlsx-populate/lib/Cell.js:541:18)
    at new Cell (/node_modules/xlsx-populate/lib/Cell.js:24:14)
    at /node_modules/xlsx-populate/lib/Row.js:352:26
    at Array.forEach (<anonymous>)
    at Row._init (/node_modules/xlsx-populate/lib/Row.js:351:29)
    at new Row (/node_modules/xlsx-populate/lib/Row.js:20:14)
    at /node_modules/xlsx-populate/lib/Sheet.js:1381:25
    at Array.forEach (<anonymous>)
    at Sheet._init (/node_modules/xlsx-populate/lib/Sheet.js:1380:38)

Any solution of this issue ?

mestachs commented 1 year ago

I'm hitting the same issue

I've looked at the code and added a conditional break-point

   } else if (type === "inlineStr") {
            // Inline string value: can be simple text or rich text.
            const isNode = xmlq.findChild(node, 'is');
            if (isNode == undefined) {
                debugger;
            }
            if (isNode.children[0].name === "t") {
                const tNode = isNode.children[0];
                this._value = tNode.children[0];
            } else {
                this._value = isNode.children;
            }

the isNode is undefined and the current node looks like this in the browser console

{
    "name": "c",
    "attributes": {
        "r": "C2",
        "t": "inlineStr"
    },
    "children": []
}

Note that the excel has been generated via pandas

 with pd.ExcelWriter(output_folder + file_name, engine='openpyxl') as writer:
            df_hd.to_excel(writer, sheet_name='hd', index=False)
            df_settings.to_excel(writer, sheet_name='settings', index=False)
            df_survey.to_excel(writer, sheet_name='survey', index=False)
            df_choices.to_excel(writer, sheet_name='choices', index=False)

when unzipping and looking in the xml file this is what I found


<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <sheetPr>
        <outlinePr summaryBelow="1" summaryRight="1"/>
        <pageSetUpPr/>
    </sheetPr>
    <dimension ref="A1:G22"/>
    <sheetViews>
        <sheetView workbookViewId="0">
            <selection activeCell="A1" sqref="A1"/>
        </sheetView>
    </sheetViews>
    <sheetFormatPr baseColWidth="8" defaultRowHeight="15"/>
    <sheetData>
        <row r="1">
            <c r="A1" s="1" t="inlineStr">
                <is>
                    <t>key</t>
                </is>
            </c>
            <c r="B1" s="1" t="inlineStr">
                <is>
                    <t>value</t>
                </is>
            </c>
            <c r="C1" s="1" t="inlineStr">
                <is>
                    <t>label::English</t>
                </is>
            </c>
            <c r="D1" s="1" t="inlineStr">
                <is>
                    <t>label::French</t>
                </is>
            </c>
            <c r="E1" s="1" t="inlineStr">
                <is>
                    <t>label::Dutch</t>
                </is>
            </c>
            <c r="F1" s="1" t="inlineStr">
                <is>
                    <t>label::German</t>
                </is>
            </c>
            <c r="G1" s="1" t="inlineStr">
                <is>
                    <t>name::technical</t>
                </is>
            </c>
        </row>
        <row r="2">
            <c r="A2" t="inlineStr">
                <is>
                    <t>hdbp_code</t>
                </is>
            </c>
            <c r="B2" t="inlineStr">
                <is>
                    <t>10032</t>
                </is>
            </c>
            <c r="C2" t="inlineStr"></c>
            <c r="D2" t="inlineStr"></c>
            <c r="E2" t="inlineStr"></c>
            <c r="F2" t="inlineStr"></c>
            <c r="G2" t="inlineStr"></c>
        </row>

I suspect it generates an inlineStr for an empty "content"

image

Note that my only solution is to touch and save the excel via libre office/excel/google sheet. So it's probably related to the python library/generation.

We tried

 with pd.ExcelWriter(output_folder + file_name, engine='auto') as writer:
    ....

and it seem to solve the issue and related to openpyxl

I don't know what does the "spec" says for xlsx and this kind of empty inlineStr. but that would be cool to support it.