brick / date-time

Date and time library for PHP
MIT License
334 stars 30 forks source link

Improve performances for __toString() methods. #85

Closed gnutix closed 11 months ago

gnutix commented 1 year ago

Hello @BenMorel,

Another performance PR. __toString() can be called a lot in our code, especially on LocalDate and LocalTime (through LocalDateTime and our own LocalDateInterval/LocalDateTimeInterval classes). Like, this much :

image

So I've spend a few hours doing lots of experimentation and profiling to find the fastest way to produce the ISO-8601 representation of these objects. The highlights of which are :

  1. concatenation is almost 2 times faster than sprintf :rocket:
  2. when padding is needed, str_pad is about 5% faster than sprintf
  3. optimizing for the most common path (putting more expensive code behind edge-case conditions helps, for example $this->year < 1000)
  4. conditions like $this->month < 10 ? '0' . $this->month : $this->month are about 30% faster than substr('0' . $this->month, -2) (another idea we've tried with @BastienClement to replace sprintf calls)

I also checked other classes' __toString() methods, and optimized those of MonthDay, YearMonth and YearWeek (which have the same needs as LocalDate).

~Finally, I've fixed some bugs in YearMonth::__toString(), which was doing sprintf('%02u', $this->year) (which seems wrong).~ fixed in #87

And added some more tests where I felt it was needed.

If you merge it, it would be great if you could make a minor release directly. :pray:

BenMorel commented 1 year ago

Hi @gnutix, thanks a lot for the time you've spent trying new optimizations!

Your tricks are clever but they hurt readability quite a lot :confused:

Before we move forward, one question: are you casting to string many different objects? Could caching __toString()'s output provide more benefit that these micro-optimizations?

gnutix commented 1 year ago

Your tricks are clever but they hurt readability quite a lot 😕

They are definitely not how anyone would write this kind of code at first, that's true. I thought about adding a comment "This code is optimized for performances" above it. And yet using str_pad is quite intuitive, and even though the conditions < 10 ? '0' . are a bit head-scratching, I don't think they are that complicated to grasp.

are you casting to string many different objects? Could caching __toString()'s output provide more benefit that these micro-optimizations?

At the scale we're at, we're creating/casting both new objects and the same objects multiple times a fair share, so surely it would be a great benefit too (I don't think both optimizations are exclusive though :wink: ).

However, I've already given this idea a lot of thoughts, and it's the implementation that's a problem. The obvious way would be to have a private iso property that would be set once when __toString is first called and then act as a cache. The obscure downside is that it makes comparing objects fragile :

$date1 = LocalDate::of(2023, 9, 29);
$date2 = LocalDate::of(2023, 9, 29);
assertEquals($date1, $date2); // true
echo (string) $date2; // calling __toString() sets the `iso` property
assertEquals($date1, $date2); // false : the `iso` property is not set on $date1 but is on $date2...

I've discovered this because I wanted to do exactly that for our LocalDateTimeInterval::isEmpty() (which is called a lot too), and our PHPUnit test suite broke in the way described... I've asked Sebastian Bergmann about it, and the only option is to change the assertion method to assertObjectEquals, which requires to have equals() functions on all objects, and when you have nested objects and arrays of objects, it gets nasty real fast... So not a viable solution, even less for a library.

A way to circumvent this downside would be to always set the iso property, in the constructor. But then the cost of creating that ISO representation is always there, even when not needed, and I'm not sure that's an acceptable solution either.

A second approach would be to use a WeakMap (so it's garbage collectable), store the object as key, and the ISO representation as value. But this makes code very complicated also, and the map needs to be initialized somehow (we've done it in our project when converting a LocalDateInterval to its LocalDateTimeInterval counterpart, and we had to put the initialization at the end of LocalDateInterval's file, after the end of the class :disappointed_relieved: ).

Of course, if you have another idea how we could implement this kind of cache, I'd be very happy to explore it. Meanwhile, I've opened a PR for something I knew how to fix. :man_shrugging:

I hope the readability decrease is still acceptable, considering it's an implementation detail hidden in the methods. If not, I can always try to rewrite it in a style closer to Duration::__toString(), using variables, conditions and $string .= '...'; type of concatenations (especially for LocalTime I guess).

BenMorel commented 1 year ago

I thought about adding a comment "This code is optimized for performances" above it.

I would do that, too.

At the scale we're at, we're creating/casting both new objects and the same objects multiple times a fair share, so surely it would be a great benefit too (I don't think both optimizations are exclusive though 😉 ).

Definitely not mutually exclusive, but I was wondering which one would affect you the most!

The obvious way would be to have a private iso property that would be set once when __toString is first called and then act as a cache.

That's how I would have implemented it, too.

The obscure downside is that it makes comparing objects fragile :

This is why I always advocate to not use loose comparison (== or assertEquals) on objects, as you're relying on implementation details instead of the public API. assertTrue($a->isEqualTo($b), "$a != $b") is the way to go.

A way to circumvent this downside would be to always set the iso property, in the constructor. But then the cost of creating that ISO representation is always there, even when not needed, and I'm not sure that's an acceptable solution either.

I don't think that's an acceptable solution either.

A second approach would be to use a WeakMap (so it's garbage collectable), store the object as key, and the ISO representation as value.

That's an interesting solution, where would you put the WeakMap? In an external singleton service? As a static property? I'd be interested to see this solution benchmarked.

Both the iso property and the static property would make marking the classes as @psalm-immutable funny I think, though.

I hope the readability decrease is still acceptable, considering it's an implementation detail hidden in the methods. If not, I can always try to rewrite it in a style closer to Duration::__toString(), using variables, conditions and $string .= '...'; type of concatenations.

I guess it's acceptable indeed. Let me get back to it over the weekend.

gnutix commented 1 year ago

This is why I always advocate to not use loose comparison (== or assertEquals) on objects, as you're relying on implementation details instead of the public API. assertTrue($a->isEqualTo($b), "$a != $b") is the way to go.

Agree, though that can get hard to avoid when you have more "integration tests" going on, and you might compare arrays of Timeline objects, containing arrays of LocalDateTimeInterval, made of LocalDateTime, themselves with LocalDate, etc... :see_no_evil: Comparing each property one by one can become very tedious and complicated. Or comparing dispatched events, or stuff like that.

In a sense, these objects are Value Objects, and we shouldn't have to care about the object's identities, only about their values.

If I add these private properties on LocalDate, LocalTime and LocalDateTime in our vendor's copy of brick/date-time, here's what happens to our project's test suite :

.........................FFFFF............E.E.....EEEE.......   61 / 6069 (  1%)
........................EEE..................................  122 / 6069 (  2%)
.............................................................  183 / 6069 (  3%)
.............................................................  244 / 6069 (  4%)
.............................................................  305 / 6069 (  5%)
.............................................................  366 / 6069 (  6%)
.............................................................  427 / 6069 (  7%)
.............................................................  488 / 6069 (  8%)
.............................................................  549 / 6069 (  9%)
.............................................................  610 / 6069 ( 10%)
............................................................E  671 / 6069 ( 11%)
....F........................................................  732 / 6069 ( 12%)
.EEEEE.......................................................  793 / 6069 ( 13%)
.............................................................  854 / 6069 ( 14%)
.............................................................  915 / 6069 ( 15%)
.............................................................  976 / 6069 ( 16%)
..............................EEEEE..EEEEEE.................. 1037 / 6069 ( 17%)
............................................................. 1098 / 6069 ( 18%)
................................EEEEEEEEEFFFFFFFFFFFFFFFFF... 1159 / 6069 ( 19%)
FFFFFF....................................................... 1220 / 6069 ( 20%)
.....EEEEEEEEEEEEEEE.E.EEEEEEEEEE............................ 1281 / 6069 ( 21%)
............................................................. 1342 / 6069 ( 22%)
............................................................. 1403 / 6069 ( 23%)
............................................................. 1464 / 6069 ( 24%)
...............................................EEEEEEE.EEEEEE 1525 / 6069 ( 25%)
EEEEEE........................EEEEEEEEEEEEEE..EE............. 1586 / 6069 ( 26%)
............................................................. 1647 / 6069 ( 27%)
..........EEEEEEEEE.................EE....EE................. 1708 / 6069 ( 28%)
............................................................. 1769 / 6069 ( 29%)
............................................................. 1830 / 6069 ( 30%)
............................................................. 1891 / 6069 ( 31%)
............................................................. 1952 / 6069 ( 32%)
..................EE......................................... 2013 / 6069 ( 33%)
............................................................. 2074 / 6069 ( 34%)
............................................................. 2135 / 6069 ( 35%)
............EEEEEEEEEEEEEE................................... 2196 / 6069 ( 36%)
...........................E.....................E........... 2257 / 6069 ( 37%)
........EEEEEEEEEEE.......................................... 2318 / 6069 ( 38%)
......................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 2379 / 6069 ( 39%)
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 2440 / 6069 ( 40%)
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 2501 / 6069 ( 41%)
........................................EEEEE.EE............. 2562 / 6069 ( 42%)
............................................................. 2623 / 6069 ( 43%)
...............F............................................. 2684 / 6069 ( 44%)
.................EEEEEE.................................EEEEE 2745 / 6069 ( 45%)
EEEE..............F.F........................................ 2806 / 6069 ( 46%)
............................................................. 2867 / 6069 ( 47%)
............................................................. 2928 / 6069 ( 48%)
............................................................. 2989 / 6069 ( 49%)
.FFFF..................................FF..F................. 3050 / 6069 ( 50%)
............................................................. 3111 / 6069 ( 51%)
............................................................. 3172 / 6069 ( 52%)
............................................................. 3233 / 6069 ( 53%)
............................................................. 3294 / 6069 ( 54%)
............................................................. 3355 / 6069 ( 55%)
............F..............F...F...EEE....................... 3416 / 6069 ( 56%)
.....F.....FF.EFF.F.FFFFFF.F.......EE..FF.FF.E.F.......EEE... 3477 / 6069 ( 57%)
.......EE...F.EEEEEEE.EEEEEFE................................ 3538 / 6069 ( 58%)
...................F...EEEEEEE....FF......................... 3599 / 6069 ( 59%)
FE.F.....FF.F.FF.F....E.............EEFFFFF.F.F.F..FFF..FFFFF 3660 / 6069 ( 60%)
F...FF.F.FFFFFFF.........F.FFF....F.F.....EF..E....E...FFFFFF 3721 / 6069 ( 61%)
FFF.EFEEEFFF......FFFF.EE.FFE.E.EE.E..E.EEEREEE.............. 3782 / 6069 ( 62%)
F..EE.FFEFE.FFF..........F......................EE.......F... 3843 / 6069 ( 63%)
..FF..E................FFFFFFF.FFFE.FEEEE...................F 3904 / 6069 ( 64%)
............................................................. 3965 / 6069 ( 65%)
............................................................. 4026 / 6069 ( 66%)
............................................................. 4087 / 6069 ( 67%)
............................................................. 4148 / 6069 ( 68%)
............................................................. 4209 / 6069 ( 69%)
............................................................. 4270 / 6069 ( 70%)
............................................................. 4331 / 6069 ( 71%)
.........................EEEEEEEEEEEEEE...................... 4392 / 6069 ( 72%)
............................................................. 4453 / 6069 ( 73%)
............................................................. 4514 / 6069 ( 74%)
............................................................. 4575 / 6069 ( 75%)
............................................................. 4636 / 6069 ( 76%)
............................................................. 4697 / 6069 ( 77%)
.................................................F.F......... 4758 / 6069 ( 78%)
............................................................. 4819 / 6069 ( 79%)
............................................................. 4880 / 6069 ( 80%)
............................................................. 4941 / 6069 ( 81%)
............................................................. 5002 / 6069 ( 82%)
............................................................. 5063 / 6069 ( 83%)
............................................................. 5124 / 6069 ( 84%)
............................................................. 5185 / 6069 ( 85%)
............................................................. 5246 / 6069 ( 86%)
............................................................. 5307 / 6069 ( 87%)
............................................................. 5368 / 6069 ( 88%)
............................................................. 5429 / 6069 ( 89%)
............................................................. 5490 / 6069 ( 90%)
............................................................. 5551 / 6069 ( 91%)
............................................................. 5612 / 6069 ( 92%)
............................................................. 5673 / 6069 ( 93%)
............................................................. 5734 / 6069 ( 94%)
............................................................. 5795 / 6069 ( 95%)
............................................................. 5856 / 6069 ( 96%)
............................................................. 5917 / 6069 ( 97%)
............................................................. 5978 / 6069 ( 98%)
........................

Not that it should dictate your decision as a library maintainer, for sure.. but to be taken into consideration ? :sweat_smile:

That's an interesting solution, where would you put the WeakMap? In an external singleton service? As a static property? I'd be interested to see this solution benchmarked.

We did it on a static property in our case. I'll see what I can do over the weekend too. :)

gnutix commented 1 year ago

WeakMap is only available on PHP >= 8.0, and the library still supports 7.4... But to satisfy your curiosity, adding one would look like this :

diff --git a/src/LocalDate.php b/src/LocalDate.php
index 49da41d..a3165fc 100644
--- a/src/LocalDate.php
+++ b/src/LocalDate.php
@@ -13,6 +13,7 @@ use DateTime;
 use DateTimeImmutable;
 use DateTimeInterface;
 use JsonSerializable;
+use WeakMap;

 use function abs;
 use function intdiv;
@@ -63,6 +64,14 @@ final class LocalDate implements JsonSerializable
      */
     private int $day;

+    /**
+     * @var WeakMap<self, string>
+     * @internal
+     *
+     * This property needs to be public, because it is initialized at the end of this file (outside the class)
+     */
+    public static $isoCache;
+
     /**
      * Private constructor. Use of() to obtain an instance.
      *
@@ -762,7 +771,8 @@ final class LocalDate implements JsonSerializable
      */
     public function __toString(): string
     {
-        return ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
+        return self::$isoCache[$this] ??=
+            ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
             . '-'
             . ($this->month < 10 ? '0' . $this->month : $this->month)
             . '-'
@@ -790,3 +800,5 @@ final class LocalDate implements JsonSerializable
         return $this->year * 12 + $this->month - 1;
     }
 }
+
+LocalDate::$isoCache = new WeakMap();

Then a benchmark PHP script calling __toString() 100 millions times on the same LocalDate takes (average of 5 runs) :

On a more "realistic" scenario (closer to our case at least), calling 5 million times takes (average of 5 runs) :

<?php

use Brick\DateTime\LocalDate;
use Brick\DateTime\Stopwatch;

require __DIR__ . '/../vendor/autoload.php';

$stopwatch = new Stopwatch();
$date = LocalDate::of(2023, 9, 8);

$stopwatch->start();
for ($i = 0; $i < 100_000_000; ++$i) {
    (string) $date;
}
$stopwatch->stop();

var_dump((string) $stopwatch->getElapsedTime());

However, I don't think that's a very representative "real world" scenario.. because we're using exactly the same instance of the LocalDate object, which would not always (or even rarely ?) be the case in a real world app.

If I change the benchmark to do (string) LocalDate::of(2023, 9, 8); in the loop, then the WeakMap changes nothing to the results. In order for the WeakMap to have an impact, one would have to do a singleton-style cache, like :

diff --git a/src/LocalDate.php b/src/LocalDate.php
index 907469a..e3e9bdf 100644
--- a/src/LocalDate.php
+++ b/src/LocalDate.php
@@ -47,6 +47,7 @@ final class LocalDate implements JsonSerializable
     public const DAYS_PER_CYCLE = 146097;

     public static $isoCache;
+    public static $instances = [];

     /**
      * The year.
@@ -92,7 +93,7 @@ final class LocalDate implements JsonSerializable
         Field\MonthOfYear::check($month);
         Field\DayOfMonth::check($day, $month, $year);

-        return new LocalDate($year, $month, $day);
+        return self::$instances[$year . '-' . $month . '-' . $day] ??= new LocalDate($year, $month, $day);
     }

     /**

Then the benchmarks to calling 5 million times (string) LocalDate::of(...); takes (average of 5 runs) :

So the gain is not that great, and there's memory issues because there's no garbage collection on the instances array.

gnutix commented 1 year ago

I've also tried adding a static array as cache for the ISO values only :

diff --git a/src/LocalDate.php b/src/LocalDate.php
index 49da41d..1e66486 100644
--- a/src/LocalDate.php
+++ b/src/LocalDate.php
@@ -63,6 +63,11 @@ final class LocalDate implements JsonSerializable
      */
     private int $day;

+    /**
+     * @var array<string, string>
+     */
+    private static $isoCache = [];
+
     /**
      * Private constructor. Use of() to obtain an instance.
      *
@@ -762,7 +767,8 @@ final class LocalDate implements JsonSerializable
      */
     public function __toString(): string
     {
-        return ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
+        return self::$isoCache[$this->year . '-' . $this->month . '-' . $this->day] ??=
+            ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
             . '-'
             . ($this->month < 10 ? '0' . $this->month : $this->month)
             . '-'

5 million calls to __toString() takes (average of 5 runs) :

It seems the differences between concatenating the properties to make the array key and concatenating the properties to make the ISO string is not great enough. So I tried again with 100 million calls, which takes (average of 5 runs) :

The difference starts being noticeable, but it's not amazing either. And this solution might not be optimal regarding memory, as there's no garbage collection of such strings in a static array.

gnutix commented 1 year ago

Conclusion : an hour and a half later, I still don't know how to add a cache to __toString() that would really bring good performance improvements and not have risks / issues / drawbacks. :sweat_smile:

Me: I thought about adding a comment "This code is optimized for performances" above it. You: I would do that, too.

Done :+1: Let met know if there's anything else I can do.

BenMorel commented 1 year ago

If I add these private properties on LocalDate, LocalTime and LocalDateTime in our vendor's copy of brick/date-time, here's what happens to our project's test suite :

Wow, you're indeed relying a lot on this. Funnily enough, I once advised a coworker against using assertEquals() with two LocalDate objects, due to the risk of me introducing a __toString() cache at some point. You've been lucky so far!

Conclusion : an hour and a half later, I still don't know how to add a cache to __toString() that would really bring good performance improvements and not have risks / issues / drawbacks. 😅

OK, let's forget about this for now. I'm still not convinced about whether or not a local cache of computed values is acceptable (I would not consider breaking loose equality a breaking change, but maybe I can be convinced otherwise), but the lack of evidence of a real-world performance improvement (in addition to the extra memory consumed) makes it a no-go for now. your sentence is suspended :slightly_smiling_face:

gnutix commented 1 year ago

Our test suite is not directly relying on assertEquals(LocalDate $a, LocalDate $b), but indirectly by comparing objects that have private properties that are themselves LocalDate (or other similar) Value objects.

Glad the bullet is dodged for now. ;)

BenMorel commented 1 year ago

Our test suite is not directly relying on assertEquals(LocalDate $a, LocalDate $b), but indirectly by comparing objects that have private properties that are themselves LocalDate (or other similar) Value objects.

Shouldn't these objects have an isEqualTo() method as well?

gnutix commented 1 year ago

They probably should. Never took the time to implement a complex method we did not needed, I guess. :man_shrugging: Checking that two collections are exactly the same is not easy, for example. And adding equals methods to events would duplicate a lot of code, which means more code to maintain...

And then there's the matter of the inexistant standard for equal methods. equals? isEqualTo? null-friendly or not? ...

gnutix commented 1 year ago

I pushed 2 commits adding toISO() methods (see #73) here, as there's a bug fixed for YearMonth::__toString() that I needed.

As usual, I've improved the test suite by adding data providers where there were none, and by using them between testJsonSerialize, testToString and the newly added testToISO.

BenMorel commented 1 year ago

I pushed 2 commits adding toISO() methods (see https://github.com/brick/date-time/issues/73) here, as there's a bug fixed for YearMonth::__toString() that I needed.

Could you please move this to a separate PR? This change is self-contained and bloats the current PR. We'll try to get it merged quickly so that you can rebase the current PR.

Also, I think I'm not sold on toISO(). I understand that adding String to a method that returns a string feels redundant, but:

For these reasons, I think I would call it toISOString() (or maybe, toIsoString()) as suggested by @tigitz.

But we can continue this discussion in the other PR!

gnutix commented 1 year ago

Could you please move this to a separate PR?

Done, cf. #87. PR is ready for merge IMHO.

gnutix commented 1 year ago

@BenMorel Rebased :) Also fixed a variable wrongly named $week (instead of $month) in YearMonthTest::testJsonSerialize().

tigitz commented 12 months ago

@BenMorel LGTM 👍

I tend to preserve acronym case in naming, helps readability IMO.

gnutix commented 11 months ago

Hey @BenMorel ! Any chance we could get this merged and released someday soon ? :pray:

gnutix commented 11 months ago

@BenMorel I've pushed all the requested changes (nice catches in there, thanks for the review! :pray:), except the code regarding the year's calculation, for which I'm not convinced (see my comment above).

gnutix commented 11 months ago

@BenMorel There you go, I've applied the last change for the year's code, and squashed the commits.

BenMorel commented 11 months ago

Thank you, @gnutix! 0.5.4 just released.