Closed gnutix closed 11 months ago
Hi @gnutix, thanks a lot for the time you've spent trying new optimizations!
Your tricks are clever but they hurt readability quite a lot :confused:
Before we move forward, one question: are you casting to string many different objects? Could caching __toString()
's output provide more benefit that these micro-optimizations?
Your tricks are clever but they hurt readability quite a lot 😕
They are definitely not how anyone would write this kind of code at first, that's true. I thought about adding a comment "This code is optimized for performances" above it. And yet using str_pad is quite intuitive, and even though the conditions < 10 ? '0' .
are a bit head-scratching, I don't think they are that complicated to grasp.
are you casting to string many different objects? Could caching __toString()'s output provide more benefit that these micro-optimizations?
At the scale we're at, we're creating/casting both new objects and the same objects multiple times a fair share, so surely it would be a great benefit too (I don't think both optimizations are exclusive though :wink: ).
However, I've already given this idea a lot of thoughts, and it's the implementation that's a problem. The obvious way would be to have a private iso
property that would be set once when __toString
is first called and then act as a cache. The obscure downside is that it makes comparing objects fragile :
$date1 = LocalDate::of(2023, 9, 29);
$date2 = LocalDate::of(2023, 9, 29);
assertEquals($date1, $date2); // true
echo (string) $date2; // calling __toString() sets the `iso` property
assertEquals($date1, $date2); // false : the `iso` property is not set on $date1 but is on $date2...
I've discovered this because I wanted to do exactly that for our LocalDateTimeInterval::isEmpty()
(which is called a lot too), and our PHPUnit test suite broke in the way described... I've asked Sebastian Bergmann about it, and the only option is to change the assertion method to assertObjectEquals
, which requires to have equals()
functions on all objects, and when you have nested objects and arrays of objects, it gets nasty real fast... So not a viable solution, even less for a library.
A way to circumvent this downside would be to always set the iso
property, in the constructor. But then the cost of creating that ISO representation is always there, even when not needed, and I'm not sure that's an acceptable solution either.
A second approach would be to use a WeakMap
(so it's garbage collectable), store the object as key, and the ISO representation as value. But this makes code very complicated also, and the map needs to be initialized somehow (we've done it in our project when converting a LocalDateInterval
to its LocalDateTimeInterval
counterpart, and we had to put the initialization at the end of LocalDateInterval
's file, after the end of the class :disappointed_relieved: ).
Of course, if you have another idea how we could implement this kind of cache, I'd be very happy to explore it. Meanwhile, I've opened a PR for something I knew how to fix. :man_shrugging:
I hope the readability decrease is still acceptable, considering it's an implementation detail hidden in the methods. If not, I can always try to rewrite it in a style closer to Duration::__toString()
, using variables, conditions and $string .= '...';
type of concatenations (especially for LocalTime
I guess).
I thought about adding a comment "This code is optimized for performances" above it.
I would do that, too.
At the scale we're at, we're creating/casting both new objects and the same objects multiple times a fair share, so surely it would be a great benefit too (I don't think both optimizations are exclusive though 😉 ).
Definitely not mutually exclusive, but I was wondering which one would affect you the most!
The obvious way would be to have a private
iso
property that would be set once when__toString
is first called and then act as a cache.
That's how I would have implemented it, too.
The obscure downside is that it makes comparing objects fragile :
This is why I always advocate to not use loose comparison (==
or assertEquals
) on objects, as you're relying on implementation details instead of the public API. assertTrue($a->isEqualTo($b), "$a != $b")
is the way to go.
A way to circumvent this downside would be to always set the iso property, in the constructor. But then the cost of creating that ISO representation is always there, even when not needed, and I'm not sure that's an acceptable solution either.
I don't think that's an acceptable solution either.
A second approach would be to use a WeakMap (so it's garbage collectable), store the object as key, and the ISO representation as value.
That's an interesting solution, where would you put the WeakMap
? In an external singleton service? As a static property? I'd be interested to see this solution benchmarked.
Both the iso
property and the static property would make marking the classes as @psalm-immutable
funny I think, though.
I hope the readability decrease is still acceptable, considering it's an implementation detail hidden in the methods. If not, I can always try to rewrite it in a style closer to
Duration::__toString()
, using variables, conditions and$string .= '...';
type of concatenations.
I guess it's acceptable indeed. Let me get back to it over the weekend.
This is why I always advocate to not use loose comparison (== or assertEquals) on objects, as you're relying on implementation details instead of the public API. assertTrue($a->isEqualTo($b), "$a != $b") is the way to go.
Agree, though that can get hard to avoid when you have more "integration tests" going on, and you might compare arrays of Timeline
objects, containing arrays of LocalDateTimeInterval
, made of LocalDateTime
, themselves with LocalDate
, etc... :see_no_evil: Comparing each property one by one can become very tedious and complicated. Or comparing dispatched events, or stuff like that.
In a sense, these objects are Value Objects, and we shouldn't have to care about the object's identities, only about their values.
If I add these private properties on LocalDate
, LocalTime
and LocalDateTime
in our vendor's copy of brick/date-time
, here's what happens to our project's test suite :
.........................FFFFF............E.E.....EEEE....... 61 / 6069 ( 1%)
........................EEE.................................. 122 / 6069 ( 2%)
............................................................. 183 / 6069 ( 3%)
............................................................. 244 / 6069 ( 4%)
............................................................. 305 / 6069 ( 5%)
............................................................. 366 / 6069 ( 6%)
............................................................. 427 / 6069 ( 7%)
............................................................. 488 / 6069 ( 8%)
............................................................. 549 / 6069 ( 9%)
............................................................. 610 / 6069 ( 10%)
............................................................E 671 / 6069 ( 11%)
....F........................................................ 732 / 6069 ( 12%)
.EEEEE....................................................... 793 / 6069 ( 13%)
............................................................. 854 / 6069 ( 14%)
............................................................. 915 / 6069 ( 15%)
............................................................. 976 / 6069 ( 16%)
..............................EEEEE..EEEEEE.................. 1037 / 6069 ( 17%)
............................................................. 1098 / 6069 ( 18%)
................................EEEEEEEEEFFFFFFFFFFFFFFFFF... 1159 / 6069 ( 19%)
FFFFFF....................................................... 1220 / 6069 ( 20%)
.....EEEEEEEEEEEEEEE.E.EEEEEEEEEE............................ 1281 / 6069 ( 21%)
............................................................. 1342 / 6069 ( 22%)
............................................................. 1403 / 6069 ( 23%)
............................................................. 1464 / 6069 ( 24%)
...............................................EEEEEEE.EEEEEE 1525 / 6069 ( 25%)
EEEEEE........................EEEEEEEEEEEEEE..EE............. 1586 / 6069 ( 26%)
............................................................. 1647 / 6069 ( 27%)
..........EEEEEEEEE.................EE....EE................. 1708 / 6069 ( 28%)
............................................................. 1769 / 6069 ( 29%)
............................................................. 1830 / 6069 ( 30%)
............................................................. 1891 / 6069 ( 31%)
............................................................. 1952 / 6069 ( 32%)
..................EE......................................... 2013 / 6069 ( 33%)
............................................................. 2074 / 6069 ( 34%)
............................................................. 2135 / 6069 ( 35%)
............EEEEEEEEEEEEEE................................... 2196 / 6069 ( 36%)
...........................E.....................E........... 2257 / 6069 ( 37%)
........EEEEEEEEEEE.......................................... 2318 / 6069 ( 38%)
......................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 2379 / 6069 ( 39%)
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 2440 / 6069 ( 40%)
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 2501 / 6069 ( 41%)
........................................EEEEE.EE............. 2562 / 6069 ( 42%)
............................................................. 2623 / 6069 ( 43%)
...............F............................................. 2684 / 6069 ( 44%)
.................EEEEEE.................................EEEEE 2745 / 6069 ( 45%)
EEEE..............F.F........................................ 2806 / 6069 ( 46%)
............................................................. 2867 / 6069 ( 47%)
............................................................. 2928 / 6069 ( 48%)
............................................................. 2989 / 6069 ( 49%)
.FFFF..................................FF..F................. 3050 / 6069 ( 50%)
............................................................. 3111 / 6069 ( 51%)
............................................................. 3172 / 6069 ( 52%)
............................................................. 3233 / 6069 ( 53%)
............................................................. 3294 / 6069 ( 54%)
............................................................. 3355 / 6069 ( 55%)
............F..............F...F...EEE....................... 3416 / 6069 ( 56%)
.....F.....FF.EFF.F.FFFFFF.F.......EE..FF.FF.E.F.......EEE... 3477 / 6069 ( 57%)
.......EE...F.EEEEEEE.EEEEEFE................................ 3538 / 6069 ( 58%)
...................F...EEEEEEE....FF......................... 3599 / 6069 ( 59%)
FE.F.....FF.F.FF.F....E.............EEFFFFF.F.F.F..FFF..FFFFF 3660 / 6069 ( 60%)
F...FF.F.FFFFFFF.........F.FFF....F.F.....EF..E....E...FFFFFF 3721 / 6069 ( 61%)
FFF.EFEEEFFF......FFFF.EE.FFE.E.EE.E..E.EEEREEE.............. 3782 / 6069 ( 62%)
F..EE.FFEFE.FFF..........F......................EE.......F... 3843 / 6069 ( 63%)
..FF..E................FFFFFFF.FFFE.FEEEE...................F 3904 / 6069 ( 64%)
............................................................. 3965 / 6069 ( 65%)
............................................................. 4026 / 6069 ( 66%)
............................................................. 4087 / 6069 ( 67%)
............................................................. 4148 / 6069 ( 68%)
............................................................. 4209 / 6069 ( 69%)
............................................................. 4270 / 6069 ( 70%)
............................................................. 4331 / 6069 ( 71%)
.........................EEEEEEEEEEEEEE...................... 4392 / 6069 ( 72%)
............................................................. 4453 / 6069 ( 73%)
............................................................. 4514 / 6069 ( 74%)
............................................................. 4575 / 6069 ( 75%)
............................................................. 4636 / 6069 ( 76%)
............................................................. 4697 / 6069 ( 77%)
.................................................F.F......... 4758 / 6069 ( 78%)
............................................................. 4819 / 6069 ( 79%)
............................................................. 4880 / 6069 ( 80%)
............................................................. 4941 / 6069 ( 81%)
............................................................. 5002 / 6069 ( 82%)
............................................................. 5063 / 6069 ( 83%)
............................................................. 5124 / 6069 ( 84%)
............................................................. 5185 / 6069 ( 85%)
............................................................. 5246 / 6069 ( 86%)
............................................................. 5307 / 6069 ( 87%)
............................................................. 5368 / 6069 ( 88%)
............................................................. 5429 / 6069 ( 89%)
............................................................. 5490 / 6069 ( 90%)
............................................................. 5551 / 6069 ( 91%)
............................................................. 5612 / 6069 ( 92%)
............................................................. 5673 / 6069 ( 93%)
............................................................. 5734 / 6069 ( 94%)
............................................................. 5795 / 6069 ( 95%)
............................................................. 5856 / 6069 ( 96%)
............................................................. 5917 / 6069 ( 97%)
............................................................. 5978 / 6069 ( 98%)
........................
Not that it should dictate your decision as a library maintainer, for sure.. but to be taken into consideration ? :sweat_smile:
That's an interesting solution, where would you put the WeakMap? In an external singleton service? As a static property? I'd be interested to see this solution benchmarked.
We did it on a static property in our case. I'll see what I can do over the weekend too. :)
WeakMap
is only available on PHP >= 8.0, and the library still supports 7.4... But to satisfy your curiosity, adding one would look like this :
diff --git a/src/LocalDate.php b/src/LocalDate.php
index 49da41d..a3165fc 100644
--- a/src/LocalDate.php
+++ b/src/LocalDate.php
@@ -13,6 +13,7 @@ use DateTime;
use DateTimeImmutable;
use DateTimeInterface;
use JsonSerializable;
+use WeakMap;
use function abs;
use function intdiv;
@@ -63,6 +64,14 @@ final class LocalDate implements JsonSerializable
*/
private int $day;
+ /**
+ * @var WeakMap<self, string>
+ * @internal
+ *
+ * This property needs to be public, because it is initialized at the end of this file (outside the class)
+ */
+ public static $isoCache;
+
/**
* Private constructor. Use of() to obtain an instance.
*
@@ -762,7 +771,8 @@ final class LocalDate implements JsonSerializable
*/
public function __toString(): string
{
- return ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
+ return self::$isoCache[$this] ??=
+ ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
. '-'
. ($this->month < 10 ? '0' . $this->month : $this->month)
. '-'
@@ -790,3 +800,5 @@ final class LocalDate implements JsonSerializable
return $this->year * 12 + $this->month - 1;
}
}
+
+LocalDate::$isoCache = new WeakMap();
Then a benchmark PHP script calling __toString()
100 millions times on the same LocalDate
takes (average of 5 runs) :
On a more "realistic" scenario (closer to our case at least), calling 5 million times takes (average of 5 runs) :
<?php
use Brick\DateTime\LocalDate;
use Brick\DateTime\Stopwatch;
require __DIR__ . '/../vendor/autoload.php';
$stopwatch = new Stopwatch();
$date = LocalDate::of(2023, 9, 8);
$stopwatch->start();
for ($i = 0; $i < 100_000_000; ++$i) {
(string) $date;
}
$stopwatch->stop();
var_dump((string) $stopwatch->getElapsedTime());
However, I don't think that's a very representative "real world" scenario.. because we're using exactly the same instance of the LocalDate
object, which would not always (or even rarely ?) be the case in a real world app.
If I change the benchmark to do (string) LocalDate::of(2023, 9, 8);
in the loop, then the WeakMap changes nothing to the results. In order for the WeakMap to have an impact, one would have to do a singleton-style cache, like :
diff --git a/src/LocalDate.php b/src/LocalDate.php
index 907469a..e3e9bdf 100644
--- a/src/LocalDate.php
+++ b/src/LocalDate.php
@@ -47,6 +47,7 @@ final class LocalDate implements JsonSerializable
public const DAYS_PER_CYCLE = 146097;
public static $isoCache;
+ public static $instances = [];
/**
* The year.
@@ -92,7 +93,7 @@ final class LocalDate implements JsonSerializable
Field\MonthOfYear::check($month);
Field\DayOfMonth::check($day, $month, $year);
- return new LocalDate($year, $month, $day);
+ return self::$instances[$year . '-' . $month . '-' . $day] ??= new LocalDate($year, $month, $day);
}
/**
Then the benchmarks to calling 5 million times (string) LocalDate::of(...);
takes (average of 5 runs) :
So the gain is not that great, and there's memory issues because there's no garbage collection on the instances array.
I've also tried adding a static array as cache for the ISO values only :
diff --git a/src/LocalDate.php b/src/LocalDate.php
index 49da41d..1e66486 100644
--- a/src/LocalDate.php
+++ b/src/LocalDate.php
@@ -63,6 +63,11 @@ final class LocalDate implements JsonSerializable
*/
private int $day;
+ /**
+ * @var array<string, string>
+ */
+ private static $isoCache = [];
+
/**
* Private constructor. Use of() to obtain an instance.
*
@@ -762,7 +767,8 @@ final class LocalDate implements JsonSerializable
*/
public function __toString(): string
{
- return ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
+ return self::$isoCache[$this->year . '-' . $this->month . '-' . $this->day] ??=
+ ($this->year < 1000 ? ($this->year < 0 ? '-' : '') . str_pad((string) abs($this->year), 4, '0', STR_PAD_LEFT) : $this->year)
. '-'
. ($this->month < 10 ? '0' . $this->month : $this->month)
. '-'
5 million calls to __toString()
takes (average of 5 runs) :
It seems the differences between concatenating the properties to make the array key and concatenating the properties to make the ISO string is not great enough. So I tried again with 100 million calls, which takes (average of 5 runs) :
The difference starts being noticeable, but it's not amazing either. And this solution might not be optimal regarding memory, as there's no garbage collection of such strings in a static array.
Conclusion : an hour and a half later, I still don't know how to add a cache to __toString()
that would really bring good performance improvements and not have risks / issues / drawbacks. :sweat_smile:
Me: I thought about adding a comment "This code is optimized for performances" above it. You: I would do that, too.
Done :+1: Let met know if there's anything else I can do.
If I add these private properties on LocalDate, LocalTime and LocalDateTime in our vendor's copy of brick/date-time, here's what happens to our project's test suite :
Wow, you're indeed relying a lot on this. Funnily enough, I once advised a coworker against using assertEquals()
with two LocalDate
objects, due to the risk of me introducing a __toString()
cache at some point. You've been lucky so far!
Conclusion : an hour and a half later, I still don't know how to add a cache to __toString() that would really bring good performance improvements and not have risks / issues / drawbacks. 😅
OK, let's forget about this for now. I'm still not convinced about whether or not a local cache of computed values is acceptable (I would not consider breaking loose equality a breaking change, but maybe I can be convinced otherwise), but the lack of evidence of a real-world performance improvement (in addition to the extra memory consumed) makes it a no-go for now. your sentence is suspended :slightly_smiling_face:
Our test suite is not directly relying on assertEquals(LocalDate $a, LocalDate $b), but indirectly by comparing objects that have private properties that are themselves LocalDate (or other similar) Value objects.
Glad the bullet is dodged for now. ;)
Our test suite is not directly relying on assertEquals(LocalDate $a, LocalDate $b), but indirectly by comparing objects that have private properties that are themselves LocalDate (or other similar) Value objects.
Shouldn't these objects have an isEqualTo()
method as well?
They probably should. Never took the time to implement a complex method we did not needed, I guess. :man_shrugging:
Checking that two collections are exactly the same is not easy, for example. And adding equals
methods to events would duplicate a lot of code, which means more code to maintain...
And then there's the matter of the inexistant standard for equal methods. equals
? isEqualTo
? null
-friendly or not? ...
I pushed 2 commits adding toISO()
methods (see #73) here, as there's a bug fixed for YearMonth::__toString()
that I needed.
TimeZoneRegion
and TimeZoneOffset
, as they already have a getId(): string
method.Month
, I named the method getName()
, as toISO()
felt wrong (in a separate commit, in case you don't want the method to be added at all) As usual, I've improved the test suite by adding data providers where there were none, and by using them between testJsonSerialize
, testToString
and the newly added testToISO
.
I pushed 2 commits adding toISO() methods (see https://github.com/brick/date-time/issues/73) here, as there's a bug fixed for YearMonth::__toString() that I needed.
Could you please move this to a separate PR? This change is self-contained and bloats the current PR. We'll try to get it merged quickly so that you can rebase the current PR.
Also, I think I'm not sold on toISO()
. I understand that adding String
to a method that returns a string feels redundant, but:
__toString()
from PHP and toString()
from Java, and really is the canonical string representation for the object, so I quite like that we're including this in the method nametoISOString()
makes it searchable in the IDE if I start typing ->tostr
; people not knowing that the string representation is called "ISO" will still be able to find the methodFor these reasons, I think I would call it toISOString()
(or maybe, toIsoString()
) as suggested by @tigitz.
But we can continue this discussion in the other PR!
Could you please move this to a separate PR?
Done, cf. #87. PR is ready for merge IMHO.
@BenMorel Rebased :) Also fixed a variable wrongly named $week
(instead of $month
) in YearMonthTest::testJsonSerialize()
.
@BenMorel LGTM 👍
I tend to preserve acronym case in naming, helps readability IMO.
Hey @BenMorel ! Any chance we could get this merged and released someday soon ? :pray:
@BenMorel I've pushed all the requested changes (nice catches in there, thanks for the review! :pray:), except the code regarding the year's calculation, for which I'm not convinced (see my comment above).
@BenMorel There you go, I've applied the last change for the year's code, and squashed the commits.
Hello @BenMorel,
Another performance PR.
__toString()
can be called a lot in our code, especially onLocalDate
andLocalTime
(throughLocalDateTime
and our ownLocalDateInterval
/LocalDateTimeInterval
classes). Like, this much :So I've spend a few hours doing lots of experimentation and profiling to find the fastest way to produce the ISO-8601 representation of these objects. The highlights of which are :
$this->year < 1000
)$this->month < 10 ? '0' . $this->month : $this->month
are about 30% faster thansubstr('0' . $this->month, -2)
(another idea we've tried with @BastienClement to replacesprintf
calls)I also checked other classes'
__toString()
methods, and optimized those ofMonthDay
,YearMonth
andYearWeek
(which have the same needs asLocalDate
).~Finally, I've fixed some bugs in
YearMonth::__toString()
, which was doingsprintf('%02u', $this->year)
(which seems wrong).~ fixed in #87And added some more tests where I felt it was needed.
If you merge it, it would be great if you could make a minor release directly. :pray: