Respect / Loader

Hello. I love to load things.
Other
20 stars 3 forks source link

Path Cache #5

Open alganet opened 12 years ago

alganet commented 12 years ago

Many autoloading mechanisms compile to PHP to gain performance.

The idea is simple: an array with class names and locations is faster than parsing the class name to find out. On class-heavy frameworks like Symfony, this can be a hell.

We could use a similar mechanism for getting similar, maybe better performance by caching the parsing results in APC (only APC, no adapters. Everyone should run APC). The APC path cache should warm up on the very first application uses and persist until the server is restarted. Implementing should be easy as well.

nickl- commented 12 years ago

We might have to consider some sort of namespace I might imagine. What would happen if I run the same classes in dev and prod environments and the cache does look-up on classname only. That will make for some interesting fault finding never the less. =)

nickl- commented 12 years ago

Turns out that class_exists would continually cause the autoloader to process those _do not exist_ checks even though all the available modules have been cached. It therefor makes sense to cache all the results and not only successful path locations but the null answers too.

I have this implemented already so watch this space...

augustohp commented 12 years ago

Agreed on all of this, but just to be clear: we will only support this for people that have APC right?

alganet commented 12 years ago

@augustohp yes!

I believe we should check for APC support on the constructor, and make this behavior optional by providing a param to override it.

$shouldCache = true;
new Loader($shouldCache);
nickl- commented 12 years ago

High five! high five

Already one step ahead of you =) think BIGGER like in Mexico she is BIG!

Some loaded questions I might ask :

I believe I have addressed these as well =) excuse the mystery it will all be revealed in due course. Glad to see the interests keep at at it!

augustohp commented 12 years ago

Simple solutions are not always the one that solves all the problems. These kind of discussions are important so we can focus on the real problems we are trying to solve.

@nickl- I am eager to see what you've done ;) As I always am =P

nickl- commented 12 years ago

@augustohp Simple does not have to be inadequate, what I refer to as simple is merely taking the shortest route to the solution by utilizing the least amount of resources with the smallest footprint (iow the least amount of code). Above that though it needs to be easily understandable to pandas. That disqualifies ASM, for example, because while being extremely efficient it is not simple.

As this journey is taking me further down the rabbit hole, while considering extendability, caching, namespace prefixes and aliases and now the biggest vulnerability since hydrogen airships see #6 _Massive security vulnerability with Autoloaders_ What started out as a simple task is very quickly turning into a monster.

So you will understand why I haven't gotten to the point where I feel comfortable to release anything for scrutiny yet.

Although here is a little something to chew on, consider it a teazer =)

PSR-0 discovery:

<?php
    /** @return resolved path or false */
    public function PSR_0_Discover($className)
    {
        /** replace \ with / and _ in CLASS NAME with / = PSR-0 */
        $file = preg_replace('/\\\|_(?=\w+$)/', DIRECTORY_SEPARATOR, $className);

        return stream_resolve_include_path("$file.php");
    }

Simple right? Which may even end up being nothing more than the preg while we decide if stream_resolve_include_path is the best approach and if relative paths might result in better performance than absolute see spl_autoload, although it makes no logical sense right? Still have a few stones unturned, never less...

@augustohp your insights, ideas and comments are invaluable even if it just helps as sanity check to verify that we are on track. We've been missing you!!! =)

augustohp commented 12 years ago

Couldn't agree more with you! It is lovely to see the include_path receive attention =D

PS: Haven't found a good-nice-awesome panda image to put here, I will try, from now, to have a database of them since I really pretend to spend more time here. =D

PPS: Awesome work in #6

alganet commented 12 years ago

I believe we should only support APC. Many applications don't rely on APC for data caching nowadays 'cause that data can't be shared among different installations. Autoloading is just the kind of data that doesn't need data sharing. It's even a bad idea to share autoloading cache.

APC is already a good practice. If the implementor is looking for performance on a cached loader, it probably already have the APC installed (since it's a great performance boost in benefit/cost).

Perhaps WinCache (the Windows flavor of APC) could be included as well, but I really don't think many people use it.

I coded Respect\Loader time to avoid those bloated autoloaders that do a lot of what PHP already does (SplClassLoader re-implements include_path in it's own way!), so there isn't much sense in changing that. Unless we change the purpose of the component itself (which is something I'm not against, that would be fine with a clear reason). I just feel that there's no reason for that, so +1 for simplicity.

nickl- commented 12 years ago

Currently I have implemented caching as a design choice from the onset and utilize a single method to do so.

From the basic (very quickly becoming not so basic anymore{ implementation which purely keeps a statically referenced collection which requires rebuilding every request yet there is a remarkable difference especially negative class_exists calls are enjoying the advantages of not having to run through every question every time it still doesn't get found.

So the method signature is :

<?php

    Loader::seen($className, $path = false);

I know I know it is way complicating let me explain =) heh

<?php

$this->seen($className, $path);  /** Sets the cache but also returns the value so you may go */

return $this->seen($className, $this->discover($classname)); /** it's all good. */

$this->seen($className, true); /** is equivalent of `Hey I've seen you but I don't know you" */

if (false !== $path $this->seen($className)) /** if you want to know what you already know */

// Wait we can do one more, what do you say? How about: 

$this->seen($classMap);  /** - Bang! =) */

I was thinking to make adapters but that is again too much overhead including adapters looking for them creating them then I decided to only extend the base functionality. I later discover it is how Symfony is doing their Apc flavour of UniversalClassLoader too, but they copied me I am telling you =)

So... because the basic loader class is already "cache-conscious" to implement the temp file cache took adding a __destroy method to persist and a few lines in the __construct to deserialize the $classmap before passing control back to parent and that's that.

Apc on the other hand is replacing the static store and because function seen handles both put and push, get and set, store and uhm retrieve?, this was actually the predicament I couldn't decide so I just made it simple. Back to APC then, which just overwrites the seen method and since it already does the return false conventions you can imagine how complicating that method can get =] not.

Extra cache: I'm thinking the likes of Memcached' and Redis, not sure about the php implementation but on Node it was AweSum!!!! Then don't forget elastic and the many new flavours of lucene which really does make a difference. I know I know you don't have to say it, talk about overkill to install SenseiDB to be your classmap cache, but what if it is already available? Hmmmm are you pondering what I'm pondering Pinky?

The Alias, Prefixing turned into a little mess though.. =/

@augustohp Yes please where do we add pandas and somewhere reliable to link to/from which won't just disappear over night. See you're hardly back and we're dishing out work already =)