aVadim483 / fast-excel-reader

Lightwight and very fast XLSX Excel Spreadsheet Reader in PHP
MIT License
63 stars 16 forks source link

Enhancement to readCellStyles #12

Closed AzzaAzza69 closed 5 months ago

AzzaAzza69 commented 1 year ago

You currently have the option to get the raw timestamp value for a date value OR the programmer-formatted value (via $excel->setDateFormat) but not being able to get BOTH or the value formatted as per the users' selected formatCode:

i.e.

$excel->setDateFormat('Y-m-d');
rawValue: 6614697600
value: 2179-08-12
userFormattedValue: 12/08/2179

This will be helpful to get the value formatted as the user would see it!

If would also be useful to add a 'category' to the readCellStyles to also be able to tell the "data type" for the cell, eg:

so as to get:

  'format-num-id' => 0,
  'format-pattern' => 'dd/mm/yyyy',
  'format-category' => 'DateTime'
AzzaAzza69 commented 1 year ago

PS. Does your recommendation of "But we do not recommend using these methods with large files" apply to just:

$sheet->readRowsWithStyles();
$sheet->readColumnsWithStyles();
$sheet->readCellsWithStyles();

or does it also apply to:

$sheet->readCellStyles
aVadim483 commented 1 year ago

or does it also apply to:

Any function that returns complete style information can cause performance issues on large XLSX files:

$sheet->readRowsWithStyles();
$sheet->readColumnsWithStyles();
$sheet->readCellsWithStyles();
$sheet->readCellStyles();

But you need to test on specific files in a specific environment, perhaps you will get acceptable results.

aVadim483 commented 1 year ago

I have improved date and time handling in new version https://github.com/aVadim483/fast-excel-reader#date-formatter

AzzaAzza69 commented 1 year ago

That's great. So if I use $excel->dateFormatter(true), how can I determine if a cell contains just a date, just a time or both a date and a time? Do I have to have a switch statement for all of the various patterns?

aVadim483 commented 1 year ago

For date/time you can use code like this:

$excel->dateFormatter(function($value, $format, $styleIdx) use($excel) {
    // get Excel format of the cell
    $excelFormat = $excel->getFormatPattern($styleIdx);

    // get format converted for use in php functions date(), gmdate(), etc
    $phpFormat = $excel->getDateFormatPattern($styleIdx);

    // do something and write to $result
    $result = gmdate($phpFormat, $value);

    return $result;
});
$cells = $sheet->readCells( );

You can define a handler for any date and time values, get the original excel format or php format string for the cell, parse it and return the result you need

aVadim483 commented 1 year ago

Excel does not mark just date or time, or datetime, you need to determine this yourself according to the patterns. And the patterns can change depending on the locale

AzzaAzza69 commented 1 year ago

Most excellent, can I suggest that the categories for formats: 18-21 should be "time" 22 : "datetime" 45-47 : "time" and a tweak the _isDatePattern accordingly to recognise the three different types. ...It would make it easier to parse spreadsheet values into the correct database table types...

aVadim483 commented 1 year ago

The task is not as simple as you think. Excel has 164 predefined formats, of which numFmtId codes 0-49 are used for all languages, and another 100+ for some languages. There are 42 codes used to indicate date and time.

Additionally, spreadsheets can have custom date and time patterns with numFmtId values greater than 164, but these will also format the date and time, and the patterns will differ between languages. So I don't think the way you suggested is correct.

Because The library must handle XLSX-files in different languages, it does not separate date and time, but uses a common date definition. But if you want, you can do it yourself in your code in dateFormatter(), like this:

$excel->dateFormatter(function($value, $format, $styleIdx) use($excel) {
    // get Excel format of the cell, e.g. '[$-F400]h:mm:ss\ AM/PM'
    $excelFormat = $excel->getFormatPattern($styleIdx);

    // get format converted for use in php functions date(), gmdate(), etc
    // for example the Excel pattern above would be converted to 'g:i:s A'
    $phpFormat = $excel->getDateFormatPattern($styleIdx);

    // and if you need you can get value of numFmtId for this cell
    $style = $excel->getCompleteStyleByIdx($styleIdx, true);
    $numFmtId = $style['format-num-id'];

    // do something and write to $result
    $result = gmdate($phpFormat, $value);

    return $result;
});

This way will give you complete freedom to handle the values, and you will be able to format the date and time separately

AzzaAzza69 commented 1 year ago

how about?

// I guess you could not bother with the valid "number" of each letter, in which case a simple strpos for character $bHasTime=(preg_match('/h{1,2}|m{1,2}|s{1,2}|/(AM\/PM)|(A\/P)', $sPattern)===1); $bHasDate=(preg_match('d{1,4}|m{1,5}|y{2}|y{4}', $sPattern)===1);

if($bHasDate && $bHasTime){ $sCategory='datetime'; }else if($bHasDate){ $sCategory='date'; }else if($bHasTime){ $sCategory='time'; }

aVadim483 commented 1 year ago

Fine! You can use this code in the dateFormatter() and format the values according to your own rules