PHPOffice / PhpSpreadsheet

A pure PHP library for reading and writing spreadsheet files
https://phpspreadsheet.readthedocs.io
MIT License
13.34k stars 3.46k forks source link

[Bug] When a cell with an empty string is used in a formula, #VALUE! is returned, while in Excel and other calc software, the empty string is considered as 0 and the formula works. #1977

Closed JulioPablo closed 3 years ago

JulioPablo commented 3 years ago

This is:

- [x] a bug report

What is the expected behavior?

The cell holding an empty string value '' \ "" is treated as a 0

What is the current behavior?

The cell holding the empty string is considered invalid and when the formula cell value is attempted to be calculated #VALUE! is returned.

What are the steps to reproduce?

If you want to reproduce this, try to get the calculated values of the cells by calling getCalculatedValue(). You can also open it on your favorite calc program and check the behavior there and how it does not match. Formula_Test.xlsx

For clarity:

image

The first row does not work in PHPSpreadsheet The second row does work in PHPSpreadsheet

Both rows work in Excel/Calc

Proposed solution

The culprit of this behavior is this function

https://github.com/PHPOffice/PhpSpreadsheet/blob/master/src/PhpSpreadsheet/Calculation/Calculation.php#L4767-L4803

My proposal is modifying the following if statement

https://github.com/PHPOffice/PhpSpreadsheet/blob/master/src/PhpSpreadsheet/Calculation/Calculation.php#L4777-L4799

To look as follows

        //    Numbers, matrices and booleans can pass straight through, as they're already valid
        if (is_string($operand)) {
            //    We only need special validations for the operand if it is a string
            //    Start by stripping off the quotation marks we use to identify true excel string values internally
            if ($operand > '' && $operand[0] == self::FORMULA_STRING_QUOTE) {
                $operand = self::unwrapResult($operand);
            }

            //    If the string is empty, treat it as a 0, as per Excel's behavior
            $operand = $operand > '' ? $operand : 0; 

            //    If the string is a numeric value, we treat it as a numeric, so no further testing
            if (!is_numeric($operand)) {
                //    If not a numeric, test to see if the value is an Excel error, and so can't be used in normal binary operations
                if ($operand[0] == '#') {
                    $stack->push('Value', $operand);
                    $this->debugLog->writeDebugLog('Evaluation Result is ', $this->showTypeDetails($operand));

                    return false;
                } elseif (!Shared\StringHelper::convertToNumberIfFraction($operand)) {
                    //    If not a numeric or a fraction, then it's a text string, and so can't be used in mathematical binary operations
                    $stack->push('Error', '#VALUE!');
                    $this->debugLog->writeDebugLog('Evaluation Result is a ', $this->showTypeDetails('#VALUE!'));

                    return false;
                }
            }
        }

More in detail, It would check if the string value is empty and if it is, set 0 as the operand value, this is done in the following line

            //    If the string is empty, treat it as a 0, as per Excel's behavior
            $operand = $operand > '' ? $operand : 0; 

Furthermore, the following empty string check wouldn't be necessary.

https://github.com/PHPOffice/PhpSpreadsheet/blob/17af13281b7f10294a5480704b684c5a793a5a0c/src/PhpSpreadsheet/Calculation/Calculation.php#L4786

and would simply be replaced for

                //    If not a numeric, test to see if the value is an Excel error, and so can't be used in normal binary operations
                if ($operand[0] == '#') {

@MarkBaker pinging you again, I saw you're mentioned in the blame of this file as well and wanted your input on this proposed solution. I'm also not sure if this would suffice for the behavior to match Excel's on all cases, or if this only cover Numeric Binary Operations and other situations would have to also be taken into account, so just wanted to raise that point as well.

Which versions of PhpSpreadsheet and PHP are affected? I tested this on 1.16 (via Laravel Excel) but the problematic code, which I linked here is clearly still present on the latest version.

oleibman commented 3 years ago

I will take a look. The handling of empty cells and boolean values by Excel is maddening, wildly inconsistent from function to function. I have been concentrating on parameter values of null and booleans in my refactoring of MathTrig and DateTime, but have not looked at null strings. Based on this report, I imagine that those might currently be rejected by PhpSpreadsheet but accepted by Excel. If so, I could get to that after the MathTrig refactoring is complete - the code changes wouldn't be that hard, but adding the test cases ...

MarkBaker commented 3 years ago

I've just verified and created a series of failing unit tests to confirm whether the fix will work or not

oleibman commented 3 years ago

I think Excel has some more inconsistencies to worry about. Fill a cell with '=15+""', and you get #VALUE!, not 15. This seems like it ought to give the same result as the reported problem, but it doesn't. At least the result is consistent for '=15+FALSE' (15).

JulioPablo commented 3 years ago

Fill a cell with '=15+""', and you get #VALUE!

I have just confirmed this as well. Guess it only considers the empty string a 0 if it comes from another cell and not literally defined in the formula.

oleibman commented 3 years ago

As if this weren't already enough of a pain - LibreOffice and Gnumeric yield #VALUE! whether the null string is used from a cell or directly in a formula (hooray for sensible results!). OpenOffice treats it as 0 in both cases (lesser hooray for consistent results!). I am not aware of other cases where LibreOffice and OpenOffice disagree.

oleibman commented 3 years ago

How exactly did you get null string into your cell? At least some of the following ways on an Excel spreadsheet seem to work as I would have expected, even when referenced as cells (i.e. VALUE):

For all of these cells except your original, using them in a formula causes VALUE in Excel. Only your original defies expectations. Gnumeric actually seems to agree with Excel for all of these, except possibly the last, which it appears not to interpret correctly. LibreOffice agrees with Excel, except that it treats the copy-paste special exactly as the original.

JulioPablo commented 3 years ago

I'm actually not sure, this comes from a report from another department on the organization I work at. I'm supposed to import it. But I'm not sure how they managed to add that empty string, I do know the report is made manually using MS Excel, it isn't generated by another application.

Sorry I can't provide any more useful details on the origin of that null string.

oleibman commented 3 years ago

When I use PhpSpreadsheet to read your spreadsheet, it shows the value of cell B1 as null, not a null-string. That would certainly explain some of the results I'm seeing. It also shows the correct calculated value for C1, namely 13.46. So I'm not convinced that this is a PhpSpreadsheet calculation problem. There definitely could be a problem with how it's reading in the value in B1. I am still trying to figure out what it's XML value really says about it.

JulioPablo commented 3 years ago

That's interesting, when I read it with PhpSpreadsheet I get "" for B1

image

oleibman commented 3 years ago

Okay, let's compare code:

        $reader = new ReaderXlsx();
        $file = 'issue1977.xlsx';
        $spreadsheet = $reader->load($file);
        echo "Loaded $file\n";
        $sheet = $spreadsheet->getActiveSheet();
        $val = $sheet->getCell('B1')->getValue();
        var_dump($val);
        $val2 = $sheet->getCell('C1')->getCalculatedValue();
        var_dump($val2);
Loaded issue1977.xlsx
C:\git\issue1977.php:16:
NULL
C:\git\issue1977.php:18:
double(13.46)
oleibman commented 3 years ago

The XML says cell B1 has a style (s="1"), but it doesn't have a value. That is consistent with its being null, not a null string. Is it possible that this has changed between the real file you're trying to import and the truncated version you posted?

JulioPablo commented 3 years ago

I'm pretty sure the cell value is the same between the original and truncated file.

But you're right about your behavior, I managed to replicate it on my end

image

I was using the PhpOffice\PhpSpreadsheet\Cell\StringValueBinder value binder :/

JulioPablo commented 3 years ago

I do believe the behavior is still erroneous, because B2 was returning null and not a string.

But definitely would've been ideal if I had mentioned the value binder to being with...

oleibman commented 3 years ago

Thanks for the update. It is a relief to know that the problem isn't quite so widespread as it appeared to be at first. It's late here, and I'm not particularly familiar with the value binder. I may have a chance to look again tomorrow with this fresh information in mind, but it may not happen till the weekend.

oleibman commented 3 years ago

Summarizing the discussion that has gone before, this problem seems not to be with Calculation, but rather with StringValueBinder. I can see at least 3 possible changes that might be made to PhpSpreadsheet to address this:

I will have to defer to @MarkBaker to decide which, if any, of those possibilities should be implemented. I do not believe that any other changes to PhpSpreadsheet are warranted based on this report. In the meantime, you have 2 possible workarounds:

This investigation also seems to have uncovered bugs in Gnumeric (handling copied-and-pasted-special cell), and OpenOffice (which handles null string as 0 unlike any of the others, and, in particular, unlike LibreOffice). If I can figure out how to file bug reports for those, I will do so. (I suppose PhpSpreadsheet could emulate the OpenOffice behavior by adding a new compatibility mode for Calculation; that seems like it would require a lot of effort for very little gain.)

oleibman commented 3 years ago

Is it okay to close this ticket now?

JulioPablo commented 3 years ago

Well, up to you guys, you're the collaborators.

I guess it is technically not fixed yet? As far as I know, anyway. So up to you how you'd like to handle it.

oleibman commented 3 years ago

I have made a couple of suggestions on how you might get your program working (default binder or roll-your-own) whether or not we make any changes. If either of those suggestions works for you, then I will close the ticket. If neither works, tell me why, and I'll try to find an approach that does work for you.

JulioPablo commented 3 years ago

Hey, yeah I actually had already got it working on my project by cloning some of PHPSpreadsheet's code. So on that end I'm good and I deeply appreciate the willingness to help me find a approach that works 👍

I just opened the bug report because I thought it might be of interest for you folks, but again, I leave the actual issue handling up to you, as far as I'm concerned I've got this up and running on my end, not sure if you guys are interested in just documenting the work arounds & findings discussed in this thread or applying a fix at the library level and so on.

oleibman commented 3 years ago

PR #2138 was just merged. It documents the workings of StringValueBinder, and also adds some new settable properties which make it easy to meet your expectations, e.g. by specifying that nulls should be left as-is.