SheetJS / sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
https://sheetjs.com/
Apache License 2.0
34.98k stars 8k forks source link

Returning wrong sheet if the workbook contains a very hidden sheet #123

Closed rla-dev closed 7 years ago

rla-dev commented 9 years ago

We are processing .xlsx files generated by SAP which contain sheets marked as 'xlsSheetVeryHidden'. In that case, accessing a visible sheet by name returns the content of the non-visible sheet (which even has a different name). It seems that the index of a sheet is calculated based on the list of 'visible' and 'normally hidden' sheets instead of all available sheets.

This behavior can easily been verified on a simple workbook containing two sheets in which the first had been marked as 'very hidden' in the 'View Code' window.

veryhidden

SheetJSDev commented 9 years ago

@rla-dev very hidden sheets are odd. The list of sheets is stored in two locations:

1) in the subfile docProps/app.xml (the document properties) -- this list does not include very hidden sheets

2) in the subfile xl/workbook.xml (the workbook manifest) -- this list includes very hidden sheets

We currently use the former list when parsing, which is why the very hidden sheets are lost. That's an easy fix. What isn't completely clear is whether very hidden sheets should be included in the list of sheet names.

@elad @johnyesberg Picking up the conversation from #111 , should the sheet list include very hidden sheets (even though it isn't clickable) or does it make sense to store the complete list elsewhere?

elad commented 9 years ago

In my opinion it shouldn't include very hidden sheets unless an option explicitly asked for them.

rla-dev commented 9 years ago

@SheetJSDev - thanks for analyzing the issue.

I think, because we are accessing the workbook on an API level, the list should contain all sheets of the workbook. The visibility of sheets is meant for interactive use. If the very hidden sheets are included in the list, it should be possible to determine the level of visibility.

If this introduces an unacceptable API break, I can live with the solution mentioned by @elad. Either way - accessing a sheet by name must under any circumstances return the requested sheet (API consistency).

elad commented 9 years ago

I don't think it's an unacceptable API break, but I do think that since Excel is a GUI spreadsheet and not a database the API's default functionality should reflect what a human would get working with the UI directly. Cases where such a default might slow down parsing and/or is unnecessary unless explicitly required (for example, styles) are reasonable for a default-off. I don't see why "very hidden sheets" should be parsed without the user asking them to if the UI itself doesn't display them by default either.

SheetJSDev commented 9 years ago

@rla-dev @elad Correct me if I'm wrong, but AFAICT the only real functional difference between hidden and very hidden sheets is that very hidden sheets are not shown in the unhide popup (right-click a tab and select "unhide") Visible sheets can use formulae that reference very hidden sheets.

As for the intention, [MS-XLS] describes very hidden as:

the sheet is hidden and cannot be displayed using the user interface.

ECMA-376 describes very hidden as:

Indicates the sheet is hidden and cannot be shown in the user interface (UI). This state is only available programmatically.

The real question, then, is whether we should be considering SheetJS as a programmatic interface or a proxy for the GUI. Does it make sense to introduce another worksheet list, like TabNames, which reflects the visible tabs in order? (This would not include any hidden tabs, very hidden or otherwise) This order is well-defined and accessible in the workbook data.

elad commented 9 years ago

I think a property on the worksheet is a better approach than separate lists. As long as this is well documented I don't mind if by default js-xlsx also parses the hidden/very hidden sheets. I don't know what most users of js-xlsx would expect, so I can't make any arguments for or against a default value. :)

rla-dev commented 9 years ago

I think it's important to build js-xlsx as a programmatic interface, because the library is dealing with the persisted state. It should allow to read and manipulate underlying model no matter that the model also includes visual aspects (or aspects that could be interpreted in a visual way). This is in line with the concept of libraries like Apache POI (http://poi.apache.org/spreadsheet).

In my opinion it would make sense to provide a method to retrieve all the sheets within the workbook and put the state on the sheet (as @elad suggested). This unified access should also help to provide the correct implementation of the retrieval of sheets by name (indexed access is often critical and may lead to inconsistencies - as the current implement shows).

@SheetJSDev - thanks again for your effort and the great work

Mior commented 8 years ago

This seems to be still an issue. I have prepared fix so that in Sheets property are all available sheets but in SheetNames are only visible tabs. https://github.com/Mior/js-xlsx/commit/0160aaadbb601a9bcae0e9a2bed6e5c75c132d84 Users could iterate trough Sheets to get all sheets or base on SheetNames for only visible once.

It may be kind a confusing to the user that in SheetNames are not all sheets, only visible but in the other hand what if he want to display only visible sheets. We probably need some property type but what could be values for that? Is there any such sheet type in specification for our purpose?

@SheetJSDev I can see 36 Pull requests pending, is it even worth contributing? This project is one of few with xls/xlsx js projects and I love to see it live.

JohnBicknell commented 8 years ago

@Mior Thanks! this lost issue lost me the best part of a day, just tried out your fix and works great.

darrinholst commented 8 years ago

Please consider pulling in @Mior's pr. It solved my problem that I lost a few hours on today.

SheetJSDev commented 7 years ago

We're going in the opposite direction: including very hidden sheets in the list. This also lets us put in chartsheets and other features.