Closed oprypkhantc closed 4 months ago
Hi @oprypkhantc! Interesting use case, and I can see the problem in a development environment, but I don't think that separating the glob logic into an external finder is the best solution.
I see two problems here:
Probably what is missing here is a way to override the iterator(s) object (and maybe I should remove the final
modifier from the iterator classes to allow subclassing). WDYT?
it applies only to a subset of finders and will not be usable on classmap-registered files and in production environments where an authoritative classmap should be generated by composer
Yes, so I'd expect it to only be available for "glob based" finders. An interface to denote "glob based" finders perhaps?
Probably what is missing here is a way to override the iterator(s) object (and maybe I should remove the final modifier from the iterator classes to allow subclassing). WDYT?
Well, I generally prefer composition over inheritance. In this case, that'd be easier for me - to implement just the CachingFinder
, rather then extend both PSR0 and PSR4 finders. But that's just my preference; any solution will work for me :)
Yes, so I'd expect it to only be available for "glob based" finders. An interface to denote "glob based" finders perhaps?
Not applicable, unfortunately: composer finder is not always glob-based, just under some conditions.
Well, I generally prefer composition over inheritance. In this case, that'd be easier for me - to implement just the
CachingFinder
, rather then extend both PSR0 and PSR4 finders. But that's just my preference; any solution will work for me :)
Yeah, not really a great solution... I need to think about it a little bit more
A little story of how I was debugging the performance:
Our app is almost 14k files. Just the $this->search()
part (globing for all files) takes 2.5-3 seconds the first time. Even with additional caching, this initial scan will have to be done at least once per "boot". I'm thinking that's not good enough.
So I dig for a solution, a quicker glob()
. There's none; the solution implemented by you is already the fastest possible solution. Then I run time (find modules -type f | wc -l)
from the container to see if it's PHP being that inefficient - and it takes about 2 seconds to run.
I get suspicious and remember that we're running VirtioFS with Docker. I remove the mount, add a COPY
instruction to the Dockerfile
to make Docker use Linux-native filesystem and voila - the complete $this->search()
for all of the files now takes 90-130 milliseconds :)
I'd still want to cache the result of $this->search()
, as it may accumulate to almost a whole second if run multiple times. But it's still incredible how much performance is lost here.
Switching from VirtioFS is not easy, as there's no direct replacements that's free. So instead I was thinking of using a separate running binary to watch for file changes (thankfully these are fast) and writing a glob.json
with the list of files that can then be loaded by PHP, to avoid an incredibly slow glob()
.
But that would also require overriding the search
function of both finders. So I'm back to the custom FileFinder
interface :)
So I did a couple of things in an attempt to further improve performance of the "second" iteration over the finder:
$this->search()
, which cuts ~2.5 sec$info->isReadable()
from both iterators, which cuts ~1 secclass_exists()
, which cuts another ~50-70msAll this on 10k files with opcache enabled. The first iteration takes 6-7seconds (that is the glob() and autoloading/reflection), while all the consecutive calls only take ~20ms.
Wow! Thanks for the huge research work!
I presume you're using macos, linux should not have these performance issues on mounted volumes.
Anyway, based on your last comment:
- removed
$info->isReadable()
from both iterators, which cuts ~1 sec
We can probably get rid of this call, I don't think it is really useful anymore.
- added a static cache for the result of
$this->search()
, which cuts ~2.5 sec- added a global cache for the result of
class_exists()
, which cuts another ~50-70ms
I was already thinking about a glob static cache, but I've discarded it as it wasn't a really good idea. Instead, I'm writing a cache interface to possibly serve all the caching purposes in the finders and the iterators. Ideally the implementation could accept a PSR-cache object and store/load the information from that. This way, you could be able to persist some information between requests.
I'll push a PR with this as soon as possible.
I presume you're using macos, linux should not have these performance issues on mounted volumes.
Yes :) And it's a constant battle against filesystem performance in Docker 🙃
We can probably get rid of this call, I don't think it is really useful anymore.
I'm guessing that call was useful for include_once()
, so it can be moved inside the closure that calls include_once
if you wish.
I was already thinking about a glob static cache, but I've discarded it as it wasn't a really good idea. Instead, I'm writing a cache interface to possibly serve all the caching purposes in the finders and the iterators. Ideally the implementation could accept a PSR-cache object and store/load the information from that. This way, you could be able to persist some information between requests.
That'd be awesome! Thank you :)
Ok, I was working on the cache and found myself writing a class to abstract the glob
logic, which is (I think) almost identical to your FileFinder
class. So I began to think this is a pre-requisite to implement a viable cache solution.
Sorry that I understood this so late, after all this discussion. Do you mind to open a PR with your modifications?
Haha, no worries, it's your project in the end :)
I don't have modifications as a separate branch of class-finder
, rather I have a simplified version of ComposerFinder
, where I also combined PSR0/PSR4 iterators and some traits. It was fine for me to debug, as I didn't need most of the features that ComposerFinder
offers, but obviously this is no good for a PR :(
I can still work on a PR though if you wish, just thought it'd make sense to notify you first.
Hi @oprypkhantc,
Sorry for the delay. I've just merged a commit that introduced a FileFinderInterface
(with two implementations) that can be passed to the withFileFinder
method on the finders.
When you have some time, could you please test it for your use-case and give me some feedback on it? Thanks
Hi @oprypkhantc,
Sorry for the delay. I've just merged a commit that introduced a
FileFinderInterface
(with two implementations) that can be passed to thewithFileFinder
method on the finders.When you have some time, could you please test it for your use-case and give me some feedback on it?
Thanks
Thank you 🙏 Sorry for not participating in the making of the PR - I'm still busy with other tasks. I'll test it out.
0.5.1
has been released with FileFinder
support
Hey! Sorry for the long read: I wanted to provide some context for you to better understand the problem.
I have a package that depends on
class-finder
. It has multiple places where it iterates overFinderInterface
. The namespaces are always the same, but it has to be iterated separately in multiple places.For smaller projects, this is not a problem, as globbing and autoloading isn't that expensive for a couple hundred classes. In our case, the namespace we're scanning contains 10k+ classes, so the complete scan, along with autoloading and reflection, and done multiple times from different places, takes about 20sec.
Of course, the immediate solution is to simply cache the result of the
FinderInterface
. That works for a production environment, where files never change. For a development environment it's not that simple: usually only a few files change at a time, and doing a full scan every time is just not feasible. Which is why I'm instead implementingfilemtime
based caching: it usesclass-finder
'spathFilter
feature to avoid autoloading/reflecting files that are known to not have changed.The
pathFilter
solution works well if theFinderInteface
is only iterated once. In our case, it's may be iterated over 6 times from different places and with differentpathFilter
s. So a good optimization here would be to separateFileFinder
from the rest of the code and allow specifying a custom instance of it when needed:This way I would be able to cache the
glob()
that's happening inside theComposerFinder
.In practise, this would mean adding a
FileFinder
interface and refactoringRecursiveIteratorTrait
to use it. What do you think?