Add a pool for TwistedConnection so that we re-use background threads

iffy commented 9 years ago

I keep getting this error when I make many simultaneous queries (klein -> alchimia -> PostgreSQL):

QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30

I'm working around it right now by setting max_overflow=-1 (http://docs.sqlalchemy.org/en/rel_0_8/core/pooling.html#sqlalchemy.pool.QueuePool.params.max_overflow) but I think that will cause problems if I actually open more connections than the Postgres database allows. So I think I (or someone) could implement a Deferred-returning pool that returns Connections asynchronously and has requests for connections wait in a queue.

I'm filing this here, because I think alchimia should come with such a pool built in. I'll take a look at what an implementation might look like on Monday.

alex commented 9 years ago

I think this is related to "connection" pinning, which we need on database adapters that aren't thread safe.

What we want, I think, is that there's N+1 queues, where N is the number of thread. There's one queue for obtaining a handle to open of the connections, and then every action on a given connection stays with the same threads.

cc: @glyph

iffy commented 9 years ago

I've thought about this over the weekend. I'm not sure my use case calls for pinning connections to threads (though I think I see why that would be a nice option in other circumstances).

I'm using engine.execute directly, rather than engine.connect. This creates and closes connections on demand (if I'm reading the SQLAlchemy Engine docs correctly). And so it doesn't really matter to my application which thread (or connection) the query executes in. That said, I imagine I'll eventually need multi-statement transactions (and would then use engine.connect and benefit from a same-connection-per-thread model).

So, my duct tape solution right now is this:

        sem = defer.DeferredSemaphore(self.max_db_connections)
        real_execute = engine.execute
        engine.execute = lambda *a, **kw: sem.run(real_execute, *a, **kw)
        log.msg('added duct tape conn pool %s conns' % (
            self.max_db_connections,), system='database')

It looks like a Deferred-returning pool passed in to create_engine will not work because AFAICT it's accessed by synchronous SQLAlchemy code. So would an asynchronous connection pool just wrap TwistedEngine (e.g. AsyncPool(create_engine(...), max_connections=10))?

iffy commented 9 years ago

Okay, I already need to use engine.connect so I'm willing to do the work to get pooling. I have three questions:

After some code acquires a connection, what's the expected way of indicating that the code is "done" with the connection and that the connection should be returned to the pool? I could
- make Connection.close() return the connection to the pool (this seems wrong)
- add a method to Engine such as returnToPool(connection)
- add a method to Engine that runs some code with a connection and returns the connection to the pool when done (like runInteraction in adbapi)
Should pooling be built in to TwistedEngine or compose with TwistedEngine (i.e. Pooler(engine, pool_size=20))?
Am I incorrectly conflating connections and transactions? I think I probably am :)

iffy commented 9 years ago

Here's my best attempt so far for something that wraps TwistedEngine. I feel like there's probably already code somewhere that does the bulk of this. Anyway, is this the right direction?

https://gist.github.com/iffy/c2d9965d7429f2e54760

glyph commented 8 years ago

@iffy It's been a while! I'm not sure if you're still interested in this. But there's a new (private, for now) API in Twisted which may be promising for this functionality; twisted._threads: https://twistedmatrix.com/documents/16.4.1/api/twisted._threads.html

iffy commented 8 years ago

@glyph wow, it has been a while :) I'm currently not interested (things are working as they are) but if I circle around to needing this again, I'll come back here.

glyph commented 8 years ago

@iffy - I actually have an implementation on https://github.com/glyph/alchimia right now, which I hope to integrate and release in the coming week :). It's untested right now, or rather, it's un-unit-tested, but stress testing with sqlite I can provoke tracebacks right now and with my fork the tracebacks go away.

hawkowl commented 8 years ago

@glyph we can close this, right? :)

glyph commented 8 years ago

@hawkowl No, because we still haven't dealt with the issue of a Deferred-aware pool. The current strategy, while more correct, is less efficient. It opens a new thread for every ~transaction~ connection. Instead, we should recycle idle threads and connections; that's when this should be closed.

glyph commented 8 years ago

Yeah, so, it looks like with sqlite it selects the "null pool" http://docs.sqlalchemy.org/en/rel_1_1/dialects/sqlite.html?highlight=sqlite#threading-pooling-behavior - perhaps we should somehow force this to happen with any other database adapters as well, until this can be more thoroughly addressed?

alex / alchimia

Add a pool for TwistedConnection so that we re-use background threads #18