MarkPflug / Sylvan.Data.Excel

The fastest .NET library for reading Excel data files.
MIT License
243 stars 30 forks source link

fix issue when ReadStringBuffer returns without populating the strBuffer #182

Closed Ctrl-Alt-Delight closed 3 months ago

Ctrl-Alt-Delight commented 3 months ago

ReadStringBuffer can return early "if the string sits entirely within the current record" without populating the strBuffer. ReadStringBuffer returns the string needed anyway so we can just return ReadStringBuffer.

MarkPflug commented 3 months ago

Do you have a file that demonstrates this bug?

Ctrl-Alt-Delight commented 3 months ago

Sorry I have a file but it is full of private and confidential information. Opening the file and removing such information results in a file that no longer has the issue. The file is from a third party and unfortunately I have no control over its creation.

This pull request fixes my first issue that was throwing an exception of "Index was out of range. Must be non-negative and less than the size of the collection.\r\nParameter name: startIndex" when attempting to create a string from a zero length buffer (as it was never resized and populated)

I don't understand MS-XLS enough to provide the crucial information, it is happening in a record type of Label (516 / 0x0204) if that helps.

Is there some meta data I can pull from the file that would assist?

There is another issue I am facing which might be related in which the first character is a null character inside the same Label record type cause the string returned to include the null character but loose the final character of the string. I am not sure at this point with my limited knowledge if this particular issue is just a poorly formed xls file or not

MarkPflug commented 3 months ago

Would it be possible to email it to me privately? mark dot pflug at live dot com. I have a set of "external" tests that I run against files that I don't check into the repository: https://github.com/MarkPflug/Sylvan.Data.Excel/blob/main/source/Sylvan.Data.Excel.Tests/ExternalDataTests.cs

Ctrl-Alt-Delight commented 3 months ago

Would it be possible to email it to me privately? ....

I have redacted what I can without altering the behaviour of the file and sent an email just now.

MarkPflug commented 3 months ago

Got the file and am able to reproduce the issue. Thanks.

MarkPflug commented 3 months ago

This issue ended up being a bit more complicated. The null character you were seeing was actually due to a discrepancy with the .xls specification I was using for reference. The label records seem to be different between excel 95 and excel 97. The spec doesn't mention this, but it is pretty obvious what's going on when looking at the actual data.

I've implemented a fix in #183 and have published a pre-release package containing the fix. If you could test that package and verify that it works with your actual, non-redacted files. If all looks good on your end, I'll publish a final 0.4.25 build.

https://www.nuget.org/packages/Sylvan.Data.Excel/0.4.25-b0001

Ctrl-Alt-Delight commented 3 months ago

Yeah I was coming to a similar conclusion just now as well, but more just fumbling around after reading up about XLUnicodeString

Had literally just committed to a branch that was passing for my use case but would likely break non excel 95: https://github.com/MarkPflug/Sylvan.Data.Excel/commit/908fc34eb7cb255d395c075a40bc96387a6028f4

Glad you got to the bottom of it, will get this version and test.

Thanks.

Ctrl-Alt-Delight commented 3 months ago

Have tested with a couple of files directly from the third party, all is working as expected.

MarkPflug commented 3 months ago

Okay, final 0.4.25 version is now published.