cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
126 stars 35 forks source link

Can't import spreadsheets with metadata #285

Closed Daemach closed 2 years ago

Daemach commented 2 years ago

I consistently have problems with spreadsheets that contain metadata. Attached is such a file.

maletich_ex.xlsx .

cfsimplicity commented 2 years ago

Certainly something odd going on here. To flesh things out, you basically have an xlsx file which when opened up with Excel looks normal, with 29 rows including the header and 24 columns:

20220301-1

But when you try to read it into a query with library

dump( spreadsheet.read( src: path, format: "query", headerRow: 1 ) )

all that shows up is this:

Screenshot 2022-03-01 at 12-21-27 https test

Where does that text "This sheet contains FactSet XML data for use with...." come from? You mention metadata, but it's not in the struct returned by spreadsheet.info() or in the file's Properties in Excel as far as I can see.

Daemach commented 2 years ago

If you open the file in excel 2019, go to the file menu, and choose check for issues you'll get something like what you see below. Once all of the info is removed, the file imports properly. But as I have non-technical users trying to upload, this is problematic to explain.

image

Daemach commented 2 years ago

The most likely culprit is the custom XML or hidden worksheet (scroll down)

cfsimplicity commented 2 years ago

OK yes, it looks like the "metadata" is being stored in a "hidden" first sheet. In other words even though it appears above as if your workbook has just one sheet, it actually has 2. You can get round this by specifying the sheetName or sheetNumber in your read() call.

dump( spreadsheet.read( src: path, format: "query", headerRow: 1, sheetNumber: 2 ) )

Screenshot 2022-03-01 at 13-50-35 https test

But I think it would be a reasonable expectation that hidden sheets are ignored if not specified in the sheetName or sheetNumber arguments so this probably merits a fix.

To be honest, I wasn't even aware there was such a thing as a hidden sheet!

Daemach commented 2 years ago

Interesting. I wonder if there could be multiple.... a fix would be great, as I wouldn't know which sheet number to specify. Is it possible the hidden sheet number could be higher than the visible sheet(s)? I would want them all ignored, as the person trying to upload probably doesn't know they exist either.

cfsimplicity commented 2 years ago

If you read() a file without specifying the sheet then the first one is used. So this is only an issue if the hidden sheet is the first.

I propose a fix so that if you don't specify the sheet, then read() would use the first visible sheet available.

Daemach commented 2 years ago

Perfect. It would also be good to pull the visible/hidden flag in sheet info.

cfsimplicity commented 2 years ago

info() gives you metadata about the entire workbook, not individual sheets.

cfsimplicity commented 2 years ago

Fix added to the Develop branch. Please check that it's working for you.

Daemach commented 2 years ago

I hate to ask this, but what is the best way to do that?

BTW, I tried to sponsor you yesterday, and GitHub connected to Paypal and acted like it was ready to go, but you're not showing up in my list. I opened a support case but no response yet. I wonder if others are having the same kinds of issues.

cfsimplicity commented 2 years ago

You mean "pull the visible/hidden flag in sheet info"? As I say there is no "sheet info", only workbook info. Are you saying you want the visibility of individual sheets added to that? It could have a new ContainsHiddenSheets flag I suppose. What's the use case you have in mind?

Thanks for trying to sponsor. I don't really know what the problem was for you, but it has worked for other people. When I set it up it seemed as if the payments were handled by Stripe behind the scenes.

Daemach commented 2 years ago

Actually, I meant how do I pull and use the develop branch easily? I use commandbox/forgebox normally.

cfsimplicity commented 2 years ago

Oh, sorry. I think the "proper" way of doing it would be to clone the repo, but the simplest method is probably to just download the branch as a zip by clicking the green "Code" button on the main repo page (making sure "develop" is the selected branch). There's no installation needed, just copy the code to somewhere you can call it from a test .cfm script.

cfsimplicity commented 2 years ago

It would also be good to pull the visible/hidden flag in sheet info.

https://github.com/cfsimplicity/spreadsheet-cfml/issues/286

cfsimplicity commented 2 years ago

@Daemach You can grab the fix via forgebox now. Version 3.4.0 has this fix, plus the one for your previous timezone issue.

Thanks very much for persevering with the sponsorship. Much appreciated!