dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.04k stars 547 forks source link

Getting cell values is returning different format - OpenXML #1413

Closed kev-ops closed 3 weeks ago

kev-ops commented 1 year ago

Describe the bug Getting cell value returns many decimal places.

Screenshots I have this "4.9" value in excel image

When retrieving the cell value in OpenXML. 4.9 was stored as 4.90000000000000000000004. It should return value as it was inputted. image

To Reproduce // Please add a self-contained, minimum viable repro of the issue. // If you require external resources, please provide a gist or GitHub repro // An Xunit style test is preferred, but a console application would work too.

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Observed behavior A clear and concise description of what you expected to happen.

Expected behavior A clear and concise description of what you expected to happen.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

twsouthwick commented 1 year ago

Can you share how you're doing this? Sounds like this is just an artifact of using a floating point number.

kev-ops commented 1 year ago

I am reading the content of excel file(.XLSX), I am having issue in reading cell with a value "4.9" (please refer to the screenshot below, highlighted in yellow) image

In my code it gets "4.90000000xxx4" as the cell value (please refer to screenshot below) image

The cell format is General & has no special formatting (see screenshot) image

I am expecting that what is inputted in excel which in my case it should be "4.9" is what should be retrieve on code level not "4.9000000xxx4"

Checked also the XML version of the excel spreadsheet (see screenshot below) image

I can't apply the try parsing or converting the cell values in code since we are dealing with different cell values.

Having said that, what could be the workaround so we can get the cell value the same way it was inputted on excel.

Currently using OpenXML SDK 2.20.0 OS: Windows 10 .NET Target 6.0

twsouthwick commented 1 year ago

This looks like something excel is adding and the SDK is surfacing that correctly. @tomjebo @mikeebowen please correct me, but I'm assuming this is because you mark it as number and general number must be using a float which are not able to accurately represent numbers. Could the s="2" value be the significant digits?

kev-ops commented 1 year ago

Yes, the number in my case "4.9" which is a general number resulted to accurately represent numbers. which is unwanted.

Could the s="2" value be the significant digits?

Answer: I tried changing the value of AK3 cell on excel from 4.9 to string like "SAMPLE" and is in General format, the value of s remains "2". Refer to screenshot below:

image

So it is not a significant basis. t="s" refers to SharedString value I presume.

Please let me know if there will be some workaround on this issue. Thanks

twsouthwick commented 1 year ago

@tomjebo @mikeebowen thoughts as to what excel does here and if we can replicate it in the sdk?

mikeebowen commented 1 year ago

The s attribute is the index of the cell's style.

mikeebowen commented 1 year ago

The SDK is doing the correct thing here. Excel stores numbers using double-precision, so the value the SDK retrieves is correct. If the values stored by Excel in the underlying xml need to match what is displayed in the UI, users can set the precision as displayed option in Excel. With that option set Excel will store the exact decimal places showing in the UI. Note, this isn't retroactive, so if you set the "precision as displayed" option on a spreadsheet, new numbers will match the decimal places in the UI, but values entered before setting the option will remain double-precision.

gartenkralle commented 1 year ago

How does excel know when to show 4.9 or 4.90000000000000000000004?

tomjebo commented 1 year ago

@gartenkralle By the number formatting.

More:

Excel uses the s attribute in the cell element in the sheet.xml part:

        <row r="7" spans="1:7" x14ac:dyDescent="0.25">
            <c r="A7" s="1">
                <v>3.05</v>
            </c>
        </row>

In this case the following s attribute is "1":

 <c r="A7" s="1">

That corresponds to the second (zero based 1) cellXfs entry in the styles.xml part:

    <cellXfs count="2">
        <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
        <xf numFmtId="164" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
    </cellXfs>

which specifies the attribute values numFmtId="164" and applyNumberFormat="1" (meaning to apply the number format "164"). That identifies the format in numFmts:

    <numFmts count="1">
        <numFmt numFmtId="164" formatCode="0.000"/>
    </numFmts>

Therefore displaying the value:

image

kev-ops commented 11 months ago

I also have relevant concern in date formatting

Observed behavior Custom Date format and other date format returns inconsistent NumberFormatId

Please refer to the attached file 1.xlsx image

I have two custom date formats mm/dd/yyyy and mm-dd-yyyy. Checking it on XML it has numFmtId "164" and numFmtID "165" respectively. image image

image

I have also this "2.xlsx" attached file which returns a different number format Ids for the same custom date format in "1.xlsx" file image image

Checking in on XML , it has numFmts "165" and "166" for mm/dd/yyyy & mm-dd-yyyy respectively image

I have also this "3.xlsx" attached file which returns a different range of number format ids as well. image image

Checking it on XML , it has numFmtId = 171 & numFmtId = 172 for mm/dd/yyyy & mm-dd-yyyy respectively image

This also happens to other date formats as well not limited to custom date formats. We still don't know what is the basis of what numFmtId is assigned to particular date format.

How can we identify if a cell value in excel has date format (mm-dd-yyyy) or (mm/dd/yyyy) etc. ??

What we are trying to accomplish here is to check whether a cell value has format (mm-dd-yyyy) or (mm/dd/yyyy) . Since NumFmtId assignment is not consistent.

Please see all files for your reference. Thank you! Hope to hear from you soon

1.xlsx 2.xlsx 3.xlsx

kev-ops commented 11 months ago

In addition, Is it possible to convert a date in c# using these formatCodes? We are struggling converting the date values retrieved from excel in c# using the format codes in the screenshot.

image

mikeebowen commented 9 months ago

Hi @kev-ops,

"We still don't know what is the basis of what numFmtId is assigned to particular date format. / How can we identify if a cell value in excel has date format (mm-dd-yyyy) or (mm/dd/yyyy) etc. ?"

I hope that helps.

greuelpirat commented 2 months ago

I encountered the same problem but without style formatting. I just created a new sheet and entered 147.61 to the first cell. I changed no style.

This was saved: <c r="A1"><v>146.52000000000001</v></c>

And this is also what I get from CellValue.Text.

What is the recommended way to get the original value?

AlfredHellstern commented 3 weeks ago

Can you please confirm that you used Excel when entering 147.61 initially? And what's your code when using CellValue.Text using the SDK?

greuelpirat commented 3 weeks ago

Can you please confirm that you used Excel when entering 147.61 initially?

yes, this is easy to reproduce:

Image

here the created sheet: Mappe1.xlsx

And what's your code when using CellValue.Text using the SDK?

Here the reduced code:

string cellValue = cell.CellValue?.Text; // is '147.61000000000001'
if (cellValue == null)
    return null;
if (cell.DataType is { Value: var value }) // is null!
{ ... }
if (cell.StyleIndex != null) // is null!
{ ... }
return cellValue

cell.OuterXML is <x:c r="A1" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><x:v>147.61000000000001</x:v></x:c> and that matches exactly the Mappe1.xlsx.

mikeebowen commented 3 weeks ago

@greuelpirat, this is the expected behavior, it is a result of how Excel stores numbers not the SDK. See this post above: https://github.com/dotnet/Open-XML-SDK/issues/1413#issuecomment-1520954222

greuelpirat commented 3 weeks ago

@greuelpirat, this is the expected behavior, it is a result of how Excel stores numbers not the SDK. See this post above: #1413 (comment)

The option Precision as displayed does not work here. The value is still saved with the appendix. For me it looks like the option only applies to calculated cells. but not for inputs by the user since the option is the section When calculating this workbook

Anyway, I personally do not agree that this is expected behaviour (I definitely did not expect this). Anyone that comes across this issue will have to implement their own solution for this.

hakito commented 3 weeks ago

I agree with @greuelpirat. If it is expected behavior and you set the option precision as displayed. Then it's a bug!

mikeebowen commented 3 weeks ago

Hi @hakito and @greuelpirat, the extra decimals you see are a result of how Excel stores and calculates floating-point numbers, they can be confusing, but they are by design. More details on that how Excel handles floating-point numbers here: Floating-point arithmetic may give inaccurate results in Excel.

There is no bug with the OpenXML SDK here. The SDK is correctly reading the values that Excel has stored. If you believe this is a bug in Excel, you can submit feedback on the Excel Feedback Portal or submit it to the Microsoft Feedback Hub, instructions on how to submit here.