Problems with always injecting the previous entry

container-interop / service-provider

[EXPERIMENTAL] Promoting container/framework interoperability through standard service providers

72 stars 10 forks source link

Problems with always injecting the previous entry #9

Closed mnapoli closed 8 years ago

mnapoli commented 8 years ago

I've run into problems with the approach of always injecting the previous container entry:

    public static function createMyService(ContainerInterface $container, $previous = null)
    {
        ...
    }

Problems

with autowiring containers

When defining an entry with a class name the container will try to resolve the class: it's hard to know beforehand if the object can be created or not (is a constructor parameter not defined?). The solution I've temporarily adopted in PHP-DI is to try to get the previous entry and catch any exception. This is not really a proper solution. I think all autowiring containers will have this issue.

regarding performances

I don't have any data to back this up, maybe I should. I suspect always trying to find a previous entry, and if it exists resolving it, is not really good for performances. This may not be a big issue in most cases, but if overriding a complex object (e.g. overriding the Doctrine entity manager) it may result in the resolution of many container entries (dependencies of the previous entry) and the creation of many objects for nothing.

Another scenario where it could get much uglier is with autowired containers (again). When using class names for entry names, it means every container entry will first tried to be resolved using autowiring (whether it fails or not) to be thrown away later in most cases.

Solutions

We may want to not inject the "previous" entry by default. I have found the following solutions for now:

Separate how to register entries and how to extend another one

class MyServiceProvider implements ServiceProvider
{
    public static function getServices()
    {
        return [
            'my_service' => 'createMyService',
        ];
    }

    public static function createMyService(ContainerInterface $container)
    {
        ...
    }

    public static function getServiceExtensions()
    {
        return [
            'event_dispatcher' => 'extendEventDispatcher',
        ];
    }

    public static function extendEventDispatcher(EventDispatcher $previous, ContainerInterface $container)
    {
        $previous->registerListener(new MyListener());
        return $previous;
    }
}

This solution is explicit, but I don't like how we loose the simplicity of a single method.

Questions:

what you happen if you extend a non-existing entry? I guess it should fail, but do we have scenarios where we might want to override the previous one if it was set, else do something else?
Lazy "previous" entry

We can replace the $previous = null parameter with a callable that returns the previous entry (lazy loading).

That means if we don't call this function, the previous entry is never resolved. That also solves the problem with autowiring containers:

if not getting the previous entry (most of the time), no problem
if calling the previous entry and it works, no problem
if calling the previous entry and there's an exception: that's good, that means the container is misconfigured

Example:

class B implements ServiceProvider
{
    ...

    public static function getLogger(ContainerInterface $container, callable $getPrevious)
    {
        $previous = $getPrevious();
        $previous->addHandler(new SyslogHandler());
        return $previous;
    }
}

I like the simplicity of this method. I hope most people would understand how the getter-as-callable works…

Question:

what should $getPrevious() do if no previous entry exists? return null or throw?
- if throw, then how do we know beforehand to avoid the exception? Do we even need to do that?
- if return null, then how do we differentiate that with a null value that was explicitly set? Do we even need to do that?

moufmouf commented 8 years ago

Very interesting. I've never played too much with autowiring containers, so I would not have suspected there could be a problem here.

Both solutions seem pretty honest to me.

To test the 2 solutions, here is a test scenario that could be interesting:

Let's say we want to get an entry "controllers" that is an array of available controllers.

If we go for "solution 1" (Separate how to register entries and how to extend another one):

if when we try to extend a non existing entry, it fails, it means that an empty "controllers" array must be first created by the MVC framework (before being extended by the service providers bringing modules in).
if when we try to extend a non existing entry, it does not fail, then each service provider can create the array (but we are back at square one regarding the problem with autowiring containers).

If we go for "solution 2" (Lazy "previous" entry):

If a service provider is the first to write in the array, it will try to call the previous entry. It will fail, but we can catch the exception and create the empty array.

Solution 2 is slightly more flexible (but we rely on exceptions throwing / handling) to deal with the "previous" instance existence testing. It seems quite a reasonable assumption to me.

Regarding difficulty of implementation, in the Symfony bridge bundle, solution 2 might be slightly trickier to implement, but I'm pretty sure it can be done, so it's no show stopper.

All in all, solution 1 is easier to understand and more verbose, solution 2 is cleaner and more flexible. I have no real preference. My heart goes slightly towards solution 2, but I'm wondering if this will not cause issues with novice developers.

moufmouf commented 8 years ago

An alternative solution:

We could also look at the signature of the method to decide if we want to extend or not (using ReflectionFunctionAbstract::getNumberOfParameters)

    // This service is not extended, because the method takes only one parameter.
    public static function createMyService(ContainerInterface $container)
    {
        ...
    }

    // This service is extended, because the method takes two parameters.
    public static function extendEventDispatcher(ContainerInterface $container, EventDispatcher $previous)
    {
        ...
    }

I have no idea if calling the Reflection API would have a significant cost in terms of performance (though results of the arity of the functions can be cached). Stratigility is doing something similar with Middlewares vs ErrorMiddlewares

mnapoli commented 8 years ago

I'm afraid this 3rd solution might be a bit too magic for having it accepted by everyone (especially in a standard).

For the record I see a 4th solution: $container->get(the-requested-entry) (inside a factory) could return the previous entry.

That would also let modules use has(the-requested-entry) to check if the entry exists, which solves the problem of returning null VS throwing when there is no previous entry.

I wonder how easy it would be to implement it in containers though… Maybe with a decorator like:

class SubContainer implements ContainerInterface
{
    private $container;
    private $entryName;
    private $previous;

    ...

    public function get($name)
    {
        if ($name === $this->entryName) {
            return $this->previous;
        }
        return $this->container->get($name);
    }

    ...
}

In containers it could be used like this:

$factory = [$serviceProviderClass, $methodName];
$subContainer = new SubContainer($container, $entryName, $previous)
$factory($subContainer);

(edge cases ignored)

I wonder however if all these decorators created at runtime would have a significant performance impact…

And VS solution 1 and 2 it has a disadvantage: we have to treat all entries as potentially getting the previous entry, whereas with solution 1 and 2 the previous entry is a case handled only when actually needed.

Edit: What I mean in the paragraph above is that we would have to override all Container::get() and has() for all entries, because else has() would return true and get() would create an recursion loop.

mnapoli commented 8 years ago

if when we try to extend a non existing entry, it fails, it means that an empty "controllers" array must be first created by the MVC framework (before being extended by the service providers bringing modules in).

That is really a downside. The ability to extend or create an entry is really useful, especially with collections/arrays. Typical example with Twig extensions:

we don't want to have to create an empty array of extensions in the framework, the framework should not know about the modules (that's why there are modules)
the Twig module could create an empty extension list, but that would override any array defined before, so that makes the order of service providers important: we want to avoid that (not always avoidable, but with arrays most of the time we can since it's just an array_merge)

The ideal situations for collections is: get the previous value (array) and add items inside it OR if the previous doesn't exist create an array.

The only way to achieve this with solution 1 or 2 would be:

solution 1: you can extend an non-existing previous entry, so it means injecting null in that case
solution: $getPrevious() should return null if the previous entry doesn't exist

Possible with both. Only downside: impossible to know whether null means "not set" or "set to null" (like with caching, or isset, …). But I have no idea if this downside is actually a problem in real scenarios… Probably worth trying then (that's what this project is for).

moufmouf commented 8 years ago

I've been thinking about the 4th solution too. Seems a bit "hacky" to me (at least, it is not obvious to implement). Especially, I'm having a hard time realizing what this means for compiled container (like the Symfony bridge).

Also, before we decide to bury completely solution 3, I'd like to speak about it a bit.

As you stated (in your first post), we really have 2 different problems:

1- autowiring containers 2- performance (i.e. useless instantiation of overridden container entries)

Regarding performance:

Solution 3 solves the performance issue pretty well. Actually, we do not have to put in the standard that the containers should look at the arity of the method. It's just that if the arity of the method is one, then the container might be clever enough to realize it does not need to instantiate the previous entry (because it will not be used). I see this more as a clever implementation detail than a part of the requirements.

Regarding autowiring containers, I have come to realize something else:

We are dealing with entries that are provided by packages. These entries should be possible to override by the container entries. So technically, the packages entries are put into the container before the entries of the application.

Regarding autowired entries, we can consider them as being part of the "application". So I see no valid use case for autowiring an entry while we are in a service provider (otherwise, containers not supporting autowiring would be unable to resolve the service).

So my point is: couldn't we simply completely disable auto-wiring while service provider instances are being resolved? In that case, there is absolutely no problem and we can simply close this issue altogether.

What do you think?

mnapoli commented 8 years ago

Especially, I'm having a hard time realizing what this means for compiled container (like the Symfony bridge).

Yes good point.

I see this more as a clever implementation detail than a part of the requirements.

Agreed, in the context of performances.

If we take autowiring into account, we may have cases where modules are encouraged to remove the $previous parameter (when useless) so that it works with specific containers that understand that. But that's an autowiring problem, which is separate, and brings us to the second point:

So technically, the packages entries are put into the container before the entries of the application. […] I see no valid use case for autowiring an entry while we are in a service provider (otherwise, containers not supporting autowiring would be unable to resolve the service).

I have a use case in mind: an event dispatcher. In a PHP-DI application, the event dispatcher can be autowired. Then modules could "extend" this entry to register listeners.

The same modules could work with e.g. Pimple/Simplex if the event dispatcher is configured. So there's no impact on modules, both can work with autowiring/without autowiring.

couldn't we simply completely disable auto-wiring while service provider instances are being resolved?

So that wouldn't work here. We could do that, but that means that autowiring container users must declare entries that will be extended in modules, which is very weird/surprising. E.g. I create an app and use my event dispatcher (autowired), then I add a module that extends the event dispatcher and suddenly the app crashes and tell me that I'm extending a service that doesn't exist, whereas it used to work fine.

And for the record (a bit out of topic) I disagree that the order of definitions is simple (packages, then the application). To me it makes sense to leave complete liberty to users on the order. And the order I usually follow (which works well) is usually like this:

framework definitions (router, event dispatcher, ...)
module 1
module ...
app definitions (written by the end user)

But the user may also want to register a definition before a specific module. Example: if the framework doesn't define an entry that a package extends (event dispatcher missing from the framework?). Or e.g. if I don't want to use the official module for Twig but still want the "Twig-Debug" extension that decorates the previous twig instance to add logging and profiling).

moufmouf commented 8 years ago

Mmmm, ok. I get your point.

In my case, I'm not making any distinction between framework definitions and module definitions (the framework, for me, should be a set of modules). But I guess there are projects/frameworks where we could have default services provided by the framework that are used by the modules. In this case, solution 3 does not fit.

So to sum this up: solution 3 and 4 our out and we are back at solution 1 or 2.

Any preference? I feel you like better solution 2, am I right? Shall we give it a try?

mnapoli commented 8 years ago

Just thought of something: with solution 2 we can make the parameter $getPrevious optional! That allows us to differentiate between "set to null" and "does not exist":

public static function getSomeParameter($container, callable $getPrevious = null) {
    if ($getPrevious == null) {
        // there was no previous definition
        // in some cases we may want to exploit that to throw an exception to force the user to set something?
    }
    $previous = $getPrevious();
    if ($previous == null) {
        // previous definition was null
    } else {
        // ...
    }
}

That's one advantage over solution 1. That's a definitive reason to go with version 2 then :)

mnapoli commented 8 years ago

FYI I've implemented that in PHP-DI very easily :+1:

That avoids most of the problems with autowiring, but there's still one detail: a module could have a behavior where it optionally extends the previous entry, or return a new one. If the entry is a class name but that class is not configured in the autowiring container, then the container will still inject $getPrevious (it won't be null). This is exactly the same behavior as with Container::has(), i.e. it returns true but get() fails. So $getPrevious() might fail if there are parameters that can't be resolved with autowiring.

Considering I don't have a use case for this situation and this is the same behavior as has() I don't consider it a problem.

mnapoli commented 8 years ago

I've opened #10

skalpa commented 8 years ago

My first contribution here, so before everything, hello and thanks for your work (and please accept my apologies for the length of this post).

I think the solution 2 that you adopted is too limited and doesn't allow some simple use cases like conditional extensions

To give you an example:

You have a Twig provider that registers a twig service

class TwigProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            'twig' => function () {
                return new \Twig_Environment();
            }
        ];
    }
}

You have a Quote provider that registers its own services and wants to automatically add an extension to Twig. Now this provider should be usable by itself and does not require the Twig provider/service. With the actual solution I would have to do this:

class QuoteProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            'my_quote_service' => function () {
                return new Something();
            },
            'twig' => function (ContainerInterface $container, callable $getPrevious = null) {
                $twig = $getPrevious();
                $twig->addExtension(new QuoteTwigExtension($container->get('my_quote_service')));
                return $twig;
            }
        ];
    }
}

Here are the problems I see with this:

As a user of those two providers, I will have to know that for this to work, TwigProvider needs to be registered before QuoteProvider. This can quickly become a nightmare. I believe it shouldn't be the users responsibility to understand the dependencies between all the providers they use.
We can't determine if the twig service exists anymore :-( If I use the QuoteProvider but not the TwigProvider any attempt to call $container->has('twig') will return true, which is wrong.

The only way to implement this correctly with the current solution would be to split the QuoteProvider into two separate providers:

One QuoteProvider that providers the quote service
One QuoteTwigExtensionProvider that extends Twig

Again, this far from optimal. That will basically mean splitting code into dozens of optional extensions, and asking the users to manage them manually.

On the other side, a slightly modified version of the solution 1 would solve all this:

interface ServiceProvider
{
    public function getServices();
    public function getServicesExtensions(ContainerInterface $container);
}

Which in return will allow this kind of provider:

class QuoteProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            'my_quote_service' => function () {
                return new Something();
            }
        ];
    }
    public function getServicesExtensions(ContainerInterface $container)
    {
        $extensions = [];
        if ($container->has('twig')) {
            $extensions['twig'] = [$this, 'addTwigExtension'];
        }
        return $extensions;
    }
    public function addTwigExtension(ContainerInterface $container, callable $getPrevious = null) {
            $twig = $getPrevious();
            $twig->addExtension(new QuoteTwigExtension($container->get('my_quote_service')));
            return $twig;
    }
}

If the standard specifies that all the services MUST be registered by the containers before the services extensions, i.e:

// In the container
$this->register($provider1->getServices());
$this->register($provider2->getServices());
$this->register($provider3->getServices());

$this->register($provider1->getServicesExtensions());
$this->register($provider2->getServicesExtensions());
$this->register($provider3->getServicesExtensions());

Then users don't even have to worry about the providers registration order.

This is basically the approach adopted by Silex/Pimple, where you have two available methods: register() and boot().

moufmouf commented 8 years ago

Hey @skalpa,

First of all, welcome. Thanks a lot for this great contribution.

You posted this message on a closed issue, but we have an open issue (#21) that is referring to your message directly. I took the liberty of reposting your message in #21. hope you don't mind, we can keep discussing your post there!