Closed Ekman closed 4 years ago
I had some colleagues point out that this would be a breaking change since in_array
can take an object, while isset
can't.
An alternative solution would be to do something like this:
function distinct($collection)
{
$generatorFactory = function () use ($collection) {
$distinctValues = [];
$distinctObjects = new \SplObjectStorage();
foreach ($collection as $key => $value) {
if (is_object($value)) {
if (!$distinctObjects->contains($value)) {
$distinctObjects->attach($value);
yield $key => $value;
}
} else {
if (!isset($distinctValues[$value])) {
$distinctValues[$value] = true;
yield $key => $value;
}
}
}
};
return new Collection($generatorFactory);
}
Let me know what you think.
It's not that easy, because in_array
cannot only handle objects
, but even array
s (see: https://www.php.net/manual/en/function.in-array.php#example-6324), etc. (probably everything that can be an array element), while isset
can only handle elements that are valid as array keys (basically int
s and string
s)...
@jdreesen That's a fair point.
I propose one of these solutions:
isset
and introduce a backwards incompatible changeSplObjectStorage
for checking distinct objects. Use an array
to check distinct primitive valuesSplObjectStorage
in_array
It would be a shame to keep the solution as-is. Decreasing the time complexity from O(n^2)
to O(n)
is a huge improvement, even if it requires some weird code. This is truly the example where the benefits of optimized code outweighs readable code. :)
What do you guys think?
The difference in performance on
distinct
vsarray_unique
is so large that we've decided to refactor some of our code to use the latter instead.I wanted to see if I could improve the performance on
distinct
. I copied a one of the benchmarks intests/performance
and modified it to comparedistinct
vsarray_unique
instead.I ran the benchmark using the current solution and got this result:
I then changed the
distinct
implementation. I don't know anything about the underlying implementation ofarray
orin_array
in PHP, I just know PHP arrays are also key/value maps. Key/value maps tend to have very fast lookup in programming languages. According to this thenin_array
have a time complexity ofO(n)
whileisset
have a time complexity of "close toO(1)
". By switching toisset
we'll get a huge time complexity improvement.I ran the benchmark again and got this result:
That's a huge improvement! Unfortunately, still not nearly as good as
array_unique
. Is there any other way this can be done? Let me know if you have any questions or comments.