digital-asset / daml

The Daml smart contract language
https://www.digitalasset.com/developers
Other
802 stars 204 forks source link

Stale reads with H2? #8706

Open matthiasS-da opened 3 years ago

matthiasS-da commented 3 years ago

We have recently observed that H2 may deliver stale data, if the db is accessed through more than one connection:

Scenario:

  1. Write a new row to a table.
  2. Wait until the DB has reported a successful commit. (Through slick.)
  3. Query the row. Sometimes, the query yields an outdated/stale result: the row appears to be missing despite the previous commit.

We managed to work around this problem by accessing the H2 database through a single thread/connection. However, our workaround cannot be applied to the ledger api server, as the indexer and the index use separate connection pools.

Apologies, if this is already known. In that case I would be curious to learn how you work around this.

Please ping me, if you need access to the low-level data.

edmondop commented 3 years ago

My first try to contribute to DAML, so apologies if I say something stupid. I am a little confused in what you are saying: you say that a single thread connection gives you the right result, but using the API sequentially will not. The two are semantically identically, so my 2 cents is that things aren't really going sequentially (you are querying the row before the transaction has completed).

How are you ensuring that you are querying the row only after the transaction has completed? I have some experiences with Slick so I might help

matthiasS-da commented 3 years ago

Thanks for your input! Let me try to explain...

How are you ensuring that you are querying the row only after the transaction has completed?

I run queries through BasicBackend.run(...), which returns a future. Essentially, I do the following:

for {
  _ <- myBackend.run(queryInsertingRowIntoTable)
  _ <- myBackend.run(queryReadingSameRowFromTable)
} yield ()

With Postgres, the second query always sees the row inserted by the first query. In H2, this is not always the case.

edmondop commented 3 years ago

Are you using H2 in memory or file based?

matthiasS-da commented 3 years ago

in memory (for testing)

edmondop commented 3 years ago

Can you point me to the code where the H2 data source is initialized?

matthiasS-da commented 3 years ago

Unfortunately, I can't point you to the source, because it is a closed source repository.

But let me try to extract the relevant pieces for you:

We access the DB through slick.jdbc.JdbcBackend.DatabaseFactoryDef#forConfig(config). My assumption is that this will also create the DB (please let me know, if that is not the case).

In the log file, I see that the following config is used: Config(SimpleConfigObject({"driver":"org.h2.Driver","password":"???","url":"jdbc:h2:mem:dekvtvza_participant4;MODE=PostgreSQL;LOCK_TIMEOUT=10000;DB_CLOSE_DELAY=-1","user":"participant4"}))

edmondop commented 3 years ago

Ok cool, I was going to open a PR on the configuration. From H2 documentation of the WRITE_DELAY setting

Set the maximum delay between a commit and flushing the log, in milliseconds. This setting is persistent. The default is 500 ms. Admin rights are required to execute this command, as it affects all connections. This command commits an open transaction in this connection. This setting can be appended to the database URL: jdbc:h2:./test;WRITE_DELAY=0

matthiasS-da commented 3 years ago

Thanks for this hint. I will apply the change and retest.

matthiasS-da commented 3 years ago

I've retested it with ;WRITE_DELAY=0 appended to the url, but I still see the stale reads. :( Let me know if you have further suggestions how to avoid this.

edmondop commented 3 years ago

@matthiasS-da the other path we should explore is which transactional context is used for the second query. If the transaction is opened before the commit, then this would explain the behaviour

matthiasS-da commented 3 years ago
for {
  _ <- myBackend.run(queryInsertingRowIntoTable)
  _ <- myBackend.run(queryReadingSameRowFromTable)
} yield ()

Any ideas how this snipped could open the second transaction before the first one has completed?

edmondop commented 3 years ago

Are you using a connection pool?

matthiasS-da commented 3 years ago

Yes, Hikari, the default one.

edmondop commented 3 years ago

if the two queries are using two different connection in the pool, did you try to restrict the pool to one single connection?

matthiasS-da commented 3 years ago

Yes, I tried. With a single connection, the problem disappears. This is my current workaround.

matthiasS-da commented 3 years ago

I have found some time to do more testing with/without a connection pool (Hikari). When I ran my tests with H2 and without a connection pool, stale reads still occurred from time to time, although far less frequently than with a conneciton pool. When I ran my tests with PostgreSQL and with a connection pool, I did not observe a single stale read.

So overall, it seems that the stale reads are caused by H2 and not by the connection pool.

edmondop commented 3 years ago

I imagine PostgresSQL is much more stringent in terms of transactional consistency, H2 is not planned to be a production database. Want to try to do some screensharing?

matthiasS-da commented 3 years ago

Thanks for the offer. I think I am good for the moment.