Fix #4192. Although that issue can be dealt with by changing user code, it would be better to fix it within PhpSpreadsheet. A cloned worksheet may have a pointer to a spreadsheet to which it is not attached. Code can assume it does belong to the spreadsheet, and throw an exception when the spreadsheet cannot find the worksheet in question. It may also not throw an exception when it should.
In my comments to the issue, I was concerned that adding in the needed protection would add overhead to an extremely common situation (setting a cell's value) in order to avoid a pretty rare problem. However, there are problems with both the accuracy and efficiency of the existing code, and I think any performance losses caused by the additional checks will be offset by the performance gains and accuracy of the new code.
Spreadsheet getIndex attempts to find the index of a worksheet within its spreadsheet collection. It does so by comparing the hash codes of each sheet in its collection with the hash code of the sheet it is looking for. Its major problem problem is performance-related, namely that it recomputes the hash code of the target sheet with each iteration.
A more severe problem is the accuracy of the hash code. It generates this by hashing together the sheet title, the string range of its auto-filter, and a character representation of whether sheet protection is enabled. Title should definitely be part of the calculation (it must be unique for all sheets attached to a spreadsheet), but it is not clear why this subset of the other properties of Worksheet is used. It tries to save some cycles by using a dirty property to indicate whether re-hashing is necessary. It sets that property whenever the title changes, or when setProtection is called. So, it doesn't set it when auto-filter changes, and you can easily bypass setProtection when changing any of the Protection properties. Not to mention the many other properties of worksheet that can be changed. Additionally, if you clone a worksheet, the clone and the original will have the same hash code, which can lead to problems:
That code will change the value of A1 in the original, not the clone.
The hash property in Worksheet will now be calculated immediately when the object is constructed or cloned or unserialized. It will not be recalculated, and there is no longer a need for the dirty property, which is removed. Hash will be generated by spl_object_id, which was designed for this purpose. (So was spl_object_hash, but many online references suggest that _id performs much better than _hash.) Our problem example above will now throw an Exception, as it should, rather than changing the wrong cell. setValueExplicit, the problem in the original issue, will now test that the worksheet is attached to the spreadsheet before doing any style manipulation. In order that this not be a breaking change, getHashCode will continue to return string, but it is deprecated in favor of getHashInt, and Worksheet will no longer implement IComparable to facilitate the deprecation.
I had a vague hope that this change might help with issue #641. It doesn't.
[ ] CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
[ ] Documentation is updated as necessary
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.
Fix #4192. Although that issue can be dealt with by changing user code, it would be better to fix it within PhpSpreadsheet. A cloned worksheet may have a pointer to a spreadsheet to which it is not attached. Code can assume it does belong to the spreadsheet, and throw an exception when the spreadsheet cannot find the worksheet in question. It may also not throw an exception when it should.
In my comments to the issue, I was concerned that adding in the needed protection would add overhead to an extremely common situation (setting a cell's value) in order to avoid a pretty rare problem. However, there are problems with both the accuracy and efficiency of the existing code, and I think any performance losses caused by the additional checks will be offset by the performance gains and accuracy of the new code.
Spreadsheet
getIndex
attempts to find the index of a worksheet within its spreadsheet collection. It does so by comparing the hash codes of each sheet in its collection with the hash code of the sheet it is looking for. Its major problem problem is performance-related, namely that it recomputes the hash code of the target sheet with each iteration.A more severe problem is the accuracy of the hash code. It generates this by hashing together the sheet title, the string range of its auto-filter, and a character representation of whether sheet protection is enabled. Title should definitely be part of the calculation (it must be unique for all sheets attached to a spreadsheet), but it is not clear why this subset of the other properties of Worksheet is used. It tries to save some cycles by using a
dirty
property to indicate whether re-hashing is necessary. It sets that property whenever the title changes, or whensetProtection
is called. So, it doesn't set it when auto-filter changes, and you can easily bypasssetProtection
when changing any of theProtection
properties. Not to mention the many other properties of worksheet that can be changed. Additionally, if you clone a worksheet, the clone and the original will have the same hash code, which can lead to problems:That code will change the value of A1 in the original, not the clone.
The
hash
property in Worksheet will now be calculated immediately when the object is constructed or cloned or unserialized. It will not be recalculated, and there is no longer a need for thedirty
property, which is removed. Hash will be generated by spl_object_id, which was designed for this purpose. (So was spl_object_hash, but many online references suggest that _id performs much better than _hash.) Our problem example above will now throw an Exception, as it should, rather than changing the wrong cell.setValueExplicit
, the problem in the original issue, will now test that the worksheet is attached to the spreadsheet before doing any style manipulation. In order that this not be a breaking change,getHashCode
will continue to return string, but it is deprecated in favor ofgetHashInt
, and Worksheet will no longer implement IComparable to facilitate the deprecation.I had a vague hope that this change might help with issue #641. It doesn't.
This is:
Checklist:
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate). If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.