LMAX-Exchange / disruptor

High Performance Inter-Thread Messaging Library
http://lmax-exchange.github.io/disruptor/
Apache License 2.0
17.36k stars 3.92k forks source link

Problem Using LMAX Disruptor 4.0.0 with JDBC-postgres 42.6.0 #464

Closed rhodey closed 8 months ago

rhodey commented 10 months ago

Hello, first off thank you for your work!

To demonstrate this problem I have put together a POC here that can be used with or without Docker

https://github.com/rhodey/jasyncdebug

The POC targets Java 11, lmax 4.0.0 (latest), and jasync-postgresql 2.1.16 (latest)

What I am experiencing is that call to jasync.sql.db.Connection method sendPreparedStatement() is blocking only when utilized from a lmax disruptor thread.

log.info("before psql.sendPreparedStatement()");
psql.sendPreparedStatement("select 456").whenComplete((ok, err) -> {
    if (err != null) {
        log.error("error - psql query error", err);
    } else {
        log.info("psql success - returned: " + ok.getRows().get(0).getInt(0));
    }
});
log.info("after psql.sendPreparedStatement()");

"before psql.sendPreparedStatement()" is written to the console and never "after psql.sendPreparedStatement()" and the callback is never invoked either.

I thought this could be a general issue of jasync not being thread safe and to test this I included the following code

https://github.com/rhodey/jasyncdebug/blob/main/src/main/java/org/rhodey/poc/psql/PsqlService.java#L53 https://github.com/rhodey/jasyncdebug/blob/main/src/main/java/org/rhodey/poc/psql/PsqlRunnable.java#L21

In this case the message "psql test thread returned: 123"is written to the console non-blocking.

It is of importance to note that this POC utilizes lettus.io for redis connectivity, and like jasync lettus is based on netty too however none of the same problems with blocking calls are being experiences here with using lettus.io + lmax

I have already opened an issue with jasync-sql and really I am not trying to annoy anyone the case is I have put as much work into making the POC easy to navigate as possible and it appears this could be an issue due to either code base.

This issue was also present with lmax 3.3.4 and java 1.8

https://github.com/jasync-sql/jasync-sql/issues/417

Please let me know if I can provide any additional information or assistance, thanks!

rhodey commented 10 months ago

Hello I have an update I have reproduced the same problem with R2DBC postgresql driver latest

You can find my branch here: https://github.com/rhodey/jasyncdebug/tree/r2dbc

R2DBC here: https://r2dbc.io/

And R2DBC postgresql here: https://github.com/pgjdbc/r2dbc-postgresql

My research has shown that jasync-postgresql and r2dbc-postgresql are the only async / non-blocking database drivers available for postgres with Java.

I will sincerely appreciate any help with this, Thanks!

rhodey commented 10 months ago

Hello I have another update, I iterated on my POC again and this time removed jasync-sql and was able to reproduce the same behavior using the standard jdbc-postgres driver.

You can find my new jdbc branch here:

https://github.com/rhodey/jasyncdebug/tree/jdbc

Please let me know if anything more can be done to help investigate, Thanks!

Palmr commented 9 months ago

Hey, just going through this, do you have a thread dump from after you think the callback and logging should have come out? It might tell us which, if any, threads are stuck and where.

grumpyjames commented 8 months ago

Took a quick look at this.

I'm not confident the reproducer does what you think it does.

For me, the error producing one immediately null pointers like this (I could see the disruptor thread was shutting down looking at a 'stuck' java via jstack):

java.lang.NullPointerException
    at org.rhodey.poc.disruptor.DisruptorHandler.mockWork(DisruptorHandler.java:35)
    at org.rhodey.poc.disruptor.DisruptorHandler.onEvent(DisruptorHandler.java:51)
    at org.rhodey.poc.disruptor.DisruptorHandler.onEvent(DisruptorHandler.java:9)
    at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
    at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
    at java.base/java.lang.Thread.run(Thread.java:829)

because calling getConnection on PsqlService before start means things have no chance of working.

If I switch to Supplier<Connection> and pass psql::getConnection instead (so I only pull out the connection when I need it, and there's a chance it might have been initialised), I get the following output

[main] INFO org.rhodey.poc.psql.PsqlService - psql connection is valid
[Thread-2] INFO org.rhodey.poc.psql.PsqlRunnable - psql test thread returned: 123
[disruptor-0] INFO org.rhodey.poc.disruptor.DisruptorHandler - CHAN test_read
[disruptor-0] INFO org.rhodey.poc.disruptor.DisruptorHandler - COUNT 1
[disruptor-0] INFO org.rhodey.poc.disruptor.DisruptorHandler - before redis.publish()
[disruptor-0] INFO org.rhodey.poc.disruptor.DisruptorHandler - after redis.publish()
[disruptor-0] INFO org.rhodey.poc.disruptor.DisruptorHandler - before psql.sendPreparedStatement()
[disruptor-0] INFO org.rhodey.poc.disruptor.DisruptorHandler - after psql.sendPreparedStatement()
[redis-write-1] INFO org.rhodey.poc.disruptor.DisruptorHandler - redis write success
[Thread-3] INFO org.rhodey.poc.disruptor.DisruptorHandler - psql success - returned: 456
[db-sql-netty-thread-2] INFO com.github.jasync.sql.db.postgresql.codec.PostgreSQLConnectionHandler - Connection disconnected - null

As such, I don't think this is a problem with disruptor. Please reopen if that's not the case.