cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
126 stars 35 forks source link

Some spreadsheets won't parse #360

Open Daemach opened 7 months ago

Daemach commented 7 months ago

I am trying to read an xlsx file and get the following message, even though there is a valid sheet. I am getting sheet names with getInfo() then looping over them to import, but it doesn't pick up any data.

The operative line is spreadsheet.setReturnCachedFormulaValues( true ).read( argumentcollection = args );

image CB8188710A3B4338BE83A4D2FE9F65DF.xlsx

Daemach commented 7 months ago

It seems like this was brought up and fixed in the past, but I don't remember how, and I can't find the original post...

The way around this is to use the inspection tool in Excel and have it remove everything it finds. The spreadsheet loads normally at that point. If POI has such a function available, it would also be nice to have access to a chainable execution of that.

cfsimplicity commented 7 months ago

Reading the file you attached seems to work as expected using this code:

spreadsheet = New spreadsheet.Spreadsheet();
path = ExpandPath( "CB8188710A3B4338BE83A4D2FE9F65DF.xlsx" );
data = spreadsheet.setReturnCachedFormulaValues( true ).read( path, "query" );
WriteDump( data );

datadump

I've no idea where the message you saw comes from or what it refers to.

It seems like this was brought up and fixed in the past

You mean here with the spreadsheet library? It doesn't ring any bells I'm afraid.

the way around this is to use the inspection tool in Excel and have it remove everything it finds

Is that the Document Inspector? There's no such tool in POI as far as I know.

Daemach commented 7 months ago

I fear that I accidentally attached the file from which I had removed the offending data using the document inspector. https://app.screencast.com/ymcIgDAPSi9N1

Please try this one.

CB8188710A3B4338BE83A4D2FE9F65DF_broken.xlsx

cfsimplicity commented 6 months ago

Please try this one.

Same result. Reads into a query just fine using the code above.

From what you say it seems that the issue is caused by the presence of custom xml elements in your xlsx. As I say, I'm not aware of a POI method for removing those programmatically I'm afraid.