felipenoris / XLSX.jl

Excel file reader and writer for the Julia language.
https://felipenoris.github.io/XLSX.jl/stable
Other
271 stars 56 forks source link

AssertionError when opening some (but not all) files in "rw" mode in openxlsx (no formulas present in any file) #210

Open Boxylmer opened 2 years ago

Boxylmer commented 2 years ago

When opening a file with

 XLSX.openxlsx(results_path, mode="rw") do xf
        sheet = xf[sheet_name]
        # more reading functions
end

I get

ERROR: AssertionError: Some internal files were not loaded into memory. Did you use `XLSX.open_xlsx_template` to open this file?
Stacktrace:
 [1] writexlsx(output_filepath::String, xf::XLSX.XLSXFile; overwrite::Bool)
   @ XLSX C:\Users\Will Laptop\.julia\packages\XLSX\FFzH0\src\write.jl:64
...

Where the call is occurring during writexlsx(filepath, xf, overwrite=true)

Originally thought to be a ram issue, I have this calling on hundreds of other excel sheets (some larger, some smaller) with no real telltale sign as to why one would give this error. What's going on here and how can I narrow down why this would happen in a particular file? Alternatively, what's another design pattern I can apply to large files I need to open and edit? I've tried enable_cache=false, but on rw it appears to be a requirement to also have it enabled. (https://github.com/felipenoris/XLSX.jl/blob/a9738321c945ad4f1a44c1c6406b260f04d411bb/src/read.jl#L184-187)

Boxylmer commented 2 years ago

Updates:

I rolled back to 0.7.10 and faced no issues. I'll have to stay here until there's some more knowledge about what's going on here. Could this be due to new versions in ZipFile? I didn't see any code that would cause breaking changes like this when moving from 0.7.x -> 0.8.x, but this transition is what causes the issue.

Boxylmer commented 1 year ago

This is still present in 0.9.x unfortunately :(

TimG1964 commented 6 months ago

I have this issue, too. Is there a fix, or do I have to roll back to 0.7.10 (and how do I do that, I wonder...) ERROR: LoadError: AssertionError: Some internal files were not loaded into memory. Did you useXLSX.open_xlsx_templateto open this file? Stacktrace: [1] writexlsx(output_source::String, xf::XLSX.XLSXFile; overwrite::Bool) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:64 [2] writexlsx(output_source::String, xf::XLSX.XLSXFile) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:60

Note: Can't roll back because I'm using Add keep_empty_rows kwarg in 0.10.0

TimG1964 commented 6 months ago

Did you use XLSX.open_xlsx_template to open this file? Also, if my answer to this question is "yes", what does that imply?

Boxylmer commented 6 months ago

I have this issue, too. Is there a fix, or do I have to roll back to 0.7.10 (and how do I do that, I wonder...) ERROR: LoadError: AssertionError: Some internal files were not loaded into memory. Did you useXLSX.open_xlsx_templateto open this file? Stacktrace: [1] writexlsx(output_source::String, xf::XLSX.XLSXFile; overwrite::Bool) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:64 [2] writexlsx(output_source::String, xf::XLSX.XLSXFile) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:60

Note: Can't roll back because I'm using Add keep_empty_rows kwarg in 0.10.0

Im still unsure. Personally I would roll back and edit your project to filter empty rows manually.

TimG1964 commented 2 months ago

I managed to work around this for a while, but now it's got me again and now it's a bigger problem for me. I've done a little digging. I don't understand much, but here is what I've found:

ERROR: LoadError: AssertionError: Some internal files were not loaded into memory. Did you use `XLSX.open_xlsx_template` to open this file?
Stacktrace:
 [1] writexlsx(output_source::String, xf::XLSX.XLSXFile; overwrite::Bool)
   @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:64

Using some different .xlsx files works just fine.

Line 64 in write.jl says

@assert all(values(xf.files)) "Some internal files were not loaded into memory. Did you use `XLSX.open_xlsx_template` to open this file?"

So I tried to see what was going on with

println(keys(template.files))
println(values(template.files))

which showed:

["xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "customXml/item2.xml", "customXml/itemProps1.xml", "customXml/itemProps2.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "xl/worksheets/sheet2.xml", "customXml/_rels/item1.xml.rels", "xl/worksheets/_rels/sheet1.xml.rels", "customXml/item1.xml", "xl/theme/theme1.xml", "customXml/_rels/item2.xml.rels", "docProps/app.xml"]
Bool[1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1]

for a file that fails whereas, for a working file, it shows

["xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "xl/worksheets/sheet2.xml", "xl/worksheets/_rels/sheet1.xml.rels", "xl/theme/theme1.xml", "docProps/app.xml"]
Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

So it looks like XLSX.jl doesn't correctly handle the customXml content.

I can manually delete the customXml folder if I open the xlsx file as a zip. If I do that, the file will work again with XLSX.jl.

["xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "xl/worksheets/sheet2.xml", "xl/worksheets/_rels/sheet1.xml.rels", "xl/theme/theme1.xml", "docProps/app.xml"]
Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

However, it will no longer work with Excel which offers to try to repair the file. Once repaired by Excel, the file fails again but now with even more customXml elements!

["customXml/_rels/item3.xml.rels", "xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "customXml/item2.xml", "customXml/itemProps1.xml", "customXml/item3.xml", "customXml/itemProps2.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "customXml/itemProps3.xml", "xl/worksheets/sheet2.xml", "customXml/_rels/item1.xml.rels", "xl/worksheets/_rels/sheet1.xml.rels", "customXml/item1.xml", "xl/theme/theme1.xml", "docProps/custom.xml", "customXml/_rels/item2.xml.rels", "docProps/app.xml"]
Bool[0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1]

I don't understand the internals of Excel and I don't understand how XLSX.jl works. However, I found the lines:

                    # ignore custom XML internal files
                    if startswith(f.name, "customXml")
                        continue
                    end

in the code to read the template file. Are you ignoring these elements on read but requiring them on write? It would be really great to see resolution of this issue.

TimG1964 commented 2 months ago

Also see this thread on Discourse.

I’ve had a go at this with some success.

To start with, I edited read.jl to remove the exclusion of the customXml internal files from being processed by internal_xml_file_read by commenting out the lines:

# ignore custom XML internal files
#if startswith(f.name, "customXml")
#    continue
#end

This resulted in the following warnings:

┌ Warning: XMLError: xmlns: URI ed0bb2e3-83d5-46a8-9e95-92c1ca9cc599 is not absolute from XML Namespace module (code: 100, line: 1)
└ @ EzXML C:\Users\TGebbels\.julia\packages\EzXML\DL8na\src\error.jl:97
┌ Warning: XMLError: xmlns: URI 8fe8bdaf-491a-4d2e-89dd-756120b60898 is not absolute from XML Namespace module (code: 100, line: 1)
└ @ EzXML C:\Users\TGebbels\.julia\packages\EzXML\DL8na\src\error.jl:97

Besides these, the process seemed to work for my purposes. I don’t really know how significant these warnings are but I suppose Excel knows best what it puts in these elements of a .xlsx file.

Second, I changed the initial condition so that the customXml internal files were treated as binary files and simply “passed through” from read to write without being touched.

  if !startswith(f.name, "customXml") && (endswith(f.name, ".xml") || endswith(f.name, ".rels"))
  #if endswith(f.name, ".xml") || endswith(f.name, ".rels")

This also worked for my purpose and without producing any errors or warnings!

I ran tests on the revised code which gave the following results:

     Testing Running tests...
Test Summary:   | Pass  Total  Time
read test files |   23     23  3.2s
Test Summary: | Pass  Total  Time
Cell names    |  326    326  0.0s
Test Summary: | Pass  Total  Time
getindex      |   12     12  1.2s
Test Summary:     | Pass  Total  Time
Time and DateTime |    6      6  0.1s
Test Summary:  | Pass  Total  Time
number formats |   16     16  1.3s
Test Summary: | Pass  Total  Time
Defined Names |   24     24  0.5s
Test Summary: | Pass  Total  Time
Book1.xlsx    |   32     32  0.3s
Test Summary:       | Pass  Total  Time
book_1904_ptbr.xlsx |    9      9  0.3s
Test Summary: | Pass  Total  Time
numbers.xlsx  |   75     75  0.0s
Test Summary: | Pass  Total  Time
Column Range  |    9      9  0.1s
Test Summary:      | Pass  Total  Time
CellRange iterator |    1      1  0.1s
Test Summary: | Pass  Total  Time
Table         |  598    598  2.8s
Test Summary:    | Pass  Total  Time
Helper functions |   32     32  0.5s
Test Summary: | Pass  Total  Time
Write         |   73     73  0.3s
Test Summary: | Pass  Total  Time
Edit Template |    3      3  0.2s
Test Summary: | Pass  Total  Time
addsheet!     |   10     10  0.1s
Test Summary: | Pass  Total  Time
Edit          |   13     13  0.3s
Test Summary: | Pass  Total  Time
writetable    |  183    183  1.3s
Test Summary: | Pass  Total  Time
Styles        |   74     74  0.2s
Test Summary: | Pass  Total  Time
filemodes     |  200    200  2.8s
Test Summary: | Pass  Total  Time
escape        |   84     84  0.2s
Test Summary: | Pass  Total  Time
row_index     |    1      1  0.1s
Test Summary: |Time
show xlsx     | None  0.0s
Test Summary:  | Pass  Total  Time
relative paths |    5      5  0.2s
Test Summary:         | Pass  Total  Time
windows compatibility |    3      3  0.0s
Test Summary:    | Pass  Total  Time
whitespace nodes |    8      8  0.0s
Test Summary: | Pass  Total  Time
inlineStr     |   20     20  0.0s
Test Summary:         | Pass  Total  Time
Tables.jl integration |   27     27  3.4s
     Testing XLSX tests passed
TimG1964 commented 2 weeks ago

I'm hoping this PR will be merged soon. I'm still having to use the dev'd version... 😃