[RFC] Using a pool of connection

joelwurtz commented 6 years ago

Feature Request

Q	A
New Feature	yes
RFC	yes
BC Break	certainly (but not 100% sure)

Summary

Cf : https://github.com/doctrine/doctrine2/issues/7275 Hey,

I know there is actually some heavy work on the v3 of Doctrine, and it would be nice if the next version would support having a connection pool. I have myself played a little with an async driver https://github.com/joelwurtz/doctrine-async (only a poc atm but can certainly be a reality), but this use case need to have multiple connection open (as you can have queries in parallel).

Actually this is done by considering a Connection like a pool of connection and it works greats for a lot of use case.

However there is some use case where it does not, like getting the last insert id (as we cannot know from which query we want to do that) and I think there must be other ones but i don't know the internals well enough to make a list (so help needed here).

I'm wondering if this is something that can be taken into account for the next version ?

Pool interface

A simple proposal would be to have this interface:

<?php

interface Pool {
    /**
     * @throws NoConnectionAvailableException Throws this exception when the pool is full and no connection can be created
     */
    public function getConnection(): Connection;
}

Driver update

Driver interface would have a new method (not sure about the name) that will return a pool given the params:

interface Driver {
     /**
     * Attempts to create a pool of connection for a database.
     *
     * @param array       $params        All connection parameters passed by the user.
     * @param string|null $username      The username to use when connecting.
     * @param string|null $password      The password to use when connecting.
     * @param array       $driverOptions The driver options to use when connecting.
     *
     * @return \Doctrine\DBAL\Driver\Pool The database pool of connection.
     */
    public function createPool(array $params, $username = null, $password = null, array $driverOptions = []);
}

And maybe the connect could be deprecated over time, as a wrapper can be created to keep current behavior of only one connection (and which will be the default):

<?php

class SinglePool implements Pool {
    private $connection;

    public function __construct(Connection $connection)
    {
        $this->connection = $connection;
    }

    public function getConnection()
    {
        return $this->connection;
    }
}

DriverManager would also have a new method getPool and the getConnection method could be deprecated overtime

Statement

Statement / ResultStatement interfaces would have a new method getConnection which returns the connection associated to this statement, this would be necessary when some works is needed on the same connection that was used for a specific statement (like getting the last insert id)

Events

Event would not need to be updated as there are still associated to a Connection and not a Pool

Other changes

Other classes actually using a connection would need to use a Pool instead and retain the connection when it make sense, like getting the last insert id after an insert query, or when doing a transaction.

List of classes that would need to change (don't know about every use case here and this will certainly extend to every librairies using the dbal, like the ORM):

QueryBuilder
MasterSlaveConnection
ConsoleRunner
AbstractSchemaManager
AbstractSchemaSynchronizer
PoolingShardConnection
...

Specificity for a Connection coming from the Pool

Connection coming from the pool would be connected or lazy connected, user should not need to call the connect method.

Connection coming from the pool would have the responsability to lock / release themself (user should not need to be aware of this), it could easily be done by acquiring a lock when creating a connection and releasing it in the destructor of this connection.

Connection should be close / clear after release, but not necessarily shutdown - to have better performance (not sure if possible, would mainly depend on the driver), as some implementations may retain the tcp connection alive in order to avoid the tcp connect round trip

PS : This is really a preliminary work, and this issue will certainly be updated (or closed if don't wanted) over the next comments.

morozov commented 6 years ago

I don't think this feature belongs to the DBAL, especially in the way that existing APIs are modified in order to support this new feature (open closed principle violation). Instead, an attempt could be made to implement it as a separate package which depends on DBAL. It can reuse the existing interfaces as is or define their own.

joelwurtz commented 6 years ago

End goal of this, is to be able to support a Connection Pool into the ORM, but i don't know the internal well enough to know if it would be possible to do that with a separate package.

fesor commented 6 years ago

(as we cannot know from which query we want to do that)

It is binded to connection, not query. And pooling in ORM should be done in ORM, not on DBAL level. You need pool of entity mangers instead.

In case of id generator, it must consume specific entity manager which belongs to current transaction.

It could be managed like:

// $em here is instance of EntityManagerInterface which hides selection of specific
// entity manager from pool
$em->transactional(function (EntityManager $em) {
    // here $em is specific em binded to concrete transaction 
    // and it should be used for all queries within this transaction
});

The problem is that doctrine doesn't limits you from which em will be used. For example, you could still refer to pool of em instead of given em, which will break your code.

If we are talking about entity selection, you must remember that entity manager will be used in proxy classes for lazy load, and since it may be used inside transaction we always need to use correct entity manager (so whole specific entity manager should be binded to specific request, this could be done even on dependency injection level without any changes in ORM, but this needs research).

I really doubt that you will be able to make such changes into doctrine, this could be possible if you will add some limitations (to make it safer) from outside, some facades. But anyway, this can't be done on DBAL level.

Correct me if I wrong.

joelwurtz commented 6 years ago

It is binded to connection, not query. And pooling in ORM should be done in ORM, not on DBAL level. You need pool of entity mangers instead.

Why a pool of entity manager ? I would imagine that a single entity manager can use multiple connections (in case of async by example), i don't think there is any blocker for this ?

fesor commented 6 years ago

Maybe I wrong, but from what I know, it's just too risky to rely only on pool of connections.

Identity generators receives entity managers, not connections, so you need to provide correct manager wich uses correct connection
Unit of work, which can't be shared between different business transactions, which would require pool of entity managers which will be assigned per each request.
Check how transactions are handled inside ORM. [1], [2], [3].
there are life cycle events, which also have access to entity manager, and can be called inside of transaction, so this entity manager should also operate with correct connection.

So again, if I have to use doctrine inside of some async php application (reactphp, amphp) then I would just provide some kind of scoping for dependency injection, so processing of single request would receive dedicated entity manager. This will not allow any async db operations, but will make code relatively safe from side effects of Doctrine.

p.s. In reality I would just not use doctrine orm in async applications. It not worth it.

apapsch commented 5 years ago

I see the primary value for a connection pool in the DBAL, which is a fine adapter of PDO and the latter lacks a connection pool. Take for example long running processes, which employ a connection to a MySQL server. It terminates the connection after n minutes (wait_timeout may not be configurable for you). The process comes out of its idle time after n+1 minutes and PDO explodes.

Therefore connection pool is primarly useful on the DB connection level (DBAL), to completely hide connection (re)establishment. ORM could use the connection pool, however to cater to the issues outlined by @fesor the pool size would be 1. ORM would then benefit from not being affected by dead connections as well as the code that uses DBAL without ORM.

morozov commented 3 years ago

Closing as per https://github.com/doctrine/dbal/issues/3198#issuecomment-400549461.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

doctrine / dbal