PHPOffice / PhpSpreadsheet

A pure PHP library for reading and writing spreadsheet files
https://phpspreadsheet.readthedocs.io
MIT License
13.36k stars 3.47k forks source link

Non-backwards compatible rounding issue in 2.0.0 - DateTime #3899

Open BGehrels opened 9 months ago

BGehrels commented 9 months ago

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

When i persist and reread an excel sheet, the data within the read version should be the same as the data persisted. This has been the case with 1.x, but it is not with 2.0.0 if Date::excelToDateTimeObject / Date::dateTimeToExcel is used

What is the current behavior?

When i use Date::dateTimeToExcel to write a DateTime in to a Cell, and then immediately reread it from the in-memory spreadsheet using Date::excelToDateTimeObject, then it works fine.

If i persist it to a file and reread it from there it will be slightly off:

-2020-10-21T14:55:31.000000+0000 +2020-10-21T14:55:30.999994+0000

If you throw away milliseconds afterwards, for example during database persistence, as you end up a second off (which can land you in another hour of the day, another aggregation bucket, and so on)

What are the steps to reproduce?

The test posted below works fine in 1.29.0x, but not in 2.0.0

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:

<?php

use PHPUnit\Framework\TestCase;
use DateTime;
use PhpOffice\PhpSpreadsheet\Shared\Date;
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

class ExcelDateTimeOfByOneErrorTest extends TestCase
{
    public function testThatDateTimesCanBePersistedAndReread(): void
    {
        $originalDateTime = new DateTime('2020-10-21T14:55:31');

        $dateTimeFromSpreadsheet = $this->getDateTimeFrom($this->excelSheetWithDateTime($originalDateTime));
        $dateTimeFromSpreadsheetAfterPersistAndReread = $this->getDateTimeFrom($this->persistAndReread($this->excelSheetWithDateTime($originalDateTime)));

        static::assertEquals($originalDateTime, $dateTimeFromSpreadsheet);
        static::assertEquals($originalDateTime, $dateTimeFromSpreadsheetAfterPersistAndReread);
    }

    private function excelSheetWithDateTime(DateTime $dateTime): Spreadsheet
    {
        $spreadsheet = new Spreadsheet();
        $spreadsheet->getActiveSheet()->setCellValue('A1', Date::dateTimeToExcel($dateTime));
        return $spreadsheet;
    }

    public function getDateTimeFrom(Spreadsheet $spreadsheet): DateTime
    {
        return Date::excelToDateTimeObject($spreadsheet->getSheet(0)->getCell('A1')->getCalculatedValue());
    }

    private function persistAndReread(Spreadsheet $spreadsheet): Spreadsheet
    {
        $temp_pointer = tmpfile();
        $tempFileName = stream_get_meta_data($temp_pointer)['uri'];
        (new Xlsx($spreadsheet))->save($tempFileName);
        return (new \PhpOffice\PhpSpreadsheet\Reader\Xlsx())->load($tempFileName);
    }

}

What features do you think are causing the issue

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Only tested Xlsx

Which versions of PhpSpreadsheet and PHP are affected?

2.0.0 is affected, 1.29.0 is not affected.

oleibman commented 9 months ago

Caused by PR #3677. Non-backwards-compatible because, prior to that change, PhpSpreadsheet ignored fractional seconds, but Excel does not. The effect within PhpSpreadsheet/Excel is, I believe, minimal - the same data is written to Excel in either case, and, if a date/time numeric format is set, the result will be as expected.

        $sheet->getStyle('A1')->getNumberFormat()->setFormatCode('yyyy-mm-dd\\Thh:mm:ss');
        var_dump($sheet->getCell('A1')->getFormattedValue());

If you truly need to convert back to a Php DateTime, you are likely to run into rounding errors. To eliminate those, you might add a function like the following:

    public static function roundSeconds(DateTime $dt): void
    {
        $microSeconds = (int) $dt->format('u');
        if ($microSeconds >= 500000) {
            $add = 1000000 - $microSeconds;
            $dt->modify("+ $add microseconds");
        } else {
            $dt->modify("- $microSeconds microseconds");
        }
    }

It might be worthwhile to add a "rounding" parameter to excelToDateTimeObject. I'll think about that.

BGehrels commented 9 months ago

Thank you for your explanation. If i understand it correctly, the data written to the .xlsx has always been a few microseconds off due to float being unable to exactly represent every fraction of an hour. The difference between 1.29 and 2.0 is, that 1.29 automatically rounded the value to the next second, while 2.0.0 automatically rounds to the nearest microsecond, correct?

I like the idea of having something like Date::excelToDateTimeObject(value, DatePrecision::MINUTES), giving the user of the library controll over this in a way that matches their business logic.

But no matter if this is implemented or not, i think you can safe people a lot of time if this change of behaviour is clearly noted in the "Breaking changes" section of the release notes.