deephacks / lmdbjni

LMDB for Java
Apache License 2.0
204 stars 28 forks source link

Consider reusable read transactions using ThreadLocal #12

Closed krisskross closed 8 years ago

krisskross commented 9 years ago

mdb_txn_begin has notable effect on performance which can be avoided by reusing transactions per thread. This is only applicable for read transactions.

Example of how this might be implemented by Viktor Szathmáry.

private final ThreadLocal<Transaction> cachedTx = new ThreadLocal<>();

private Transaction acquireReadTransaction() {
  Transaction tx = cachedTx.get();
  if (tx == null) {
    tx = env.createTransaction(true);
  } else {
    cachedTx.set(null);
    tx.renew();
  }
  return tx;
}

private void releaseReadTransaction(Transaction tx) {
  if (cachedTx.get() == null) {
    tx.reset();
    cachedTx.set(tx);
  } else {
    tx.abort();
  }
}

It is not clear whether this code belong in lmdbjni or not, since its features are already quite sparse which is a good thing.

krisskross commented 9 years ago

Ok to close this until we figure out a good way to host this kind of (higher level) functionality?

kk00ss commented 8 years ago

Reusing transactions means not freeing deleted pages - which is sort of a bad thing in case your db is almost full. Probably some sort of pool of relatively short-lived transactions might be used with a set max lifetime?

krisskross commented 8 years ago

Ah yes, read transactions are an actual snapshot of the database. Makes sense, good call.

Yes, some kind of timeout would be appropriate. Still not sure if this feature belong here. Maybe this could be built into LMDB?

Let's see if we can summon Howard :-)

kk00ss commented 8 years ago

He would say this makes no sense -and reusing old transaction objects means not getting recent changes - it a hack, and it's not appropriate in most cases. I think LMDB is already very fast - and overhead of those transactions is very small since there is no locks in their imlementation ( copy on write) - so they are much cheaper than SQL Server transactions for example.

krisskross commented 8 years ago

Ok, let's not wake the bear ;-) We'll figure something out. Any PR's would be appreciated in the meantime.

kk00ss commented 8 years ago

PR stands for what exactly? ))

krisskross commented 8 years ago

Pull Request. But maybe not in this project. Keeping the API in lmdbjni as close as possible to native LMDB is a good thing?

kk00ss commented 8 years ago

Low level interface is better for performance and worse for adoption. Have you tried LMDB fork from Re-Open LDAP guys?

krisskross commented 8 years ago

Agreed. Do you have link?

krisskross commented 8 years ago

https://github.com/ReOpen/ReOpenLDAP ?

kk00ss commented 8 years ago

Exactly. I were unable to build it locally however. In contrast to original one.

kk00ss commented 8 years ago

Caution - Manual and FAQ is in Russian. They claim to improve a few things as well as code quality. Didn't run diff merge yet on it. Don't think they will offer any support to it. Let's just hope Putin will not cut Internet access in there ;-)

krisskross commented 8 years ago

Haha :-) I need to read google translated version of the wiki first, but this is mostly OpenLDAP mods right?

kk00ss commented 8 years ago

Wiki doesn't contain much of a technical information - it's more about who they are, why forking, which parts of original code doesn't smell right to them. Useless stuff. But in sources there is libmdb folder.

krisskross commented 8 years ago

Found it: https://github.com/ReOpen/libmdbx

kk00ss commented 8 years ago

May I ask how do you deal with constant environment size in production ?

krisskross commented 8 years ago

Set a overdimensioned value. With Linux, the OS will handle it.

krisskross commented 8 years ago

Or do you mean big data? :-)

kk00ss commented 8 years ago

I meant when you set map size before initializing environment - it creates file of that size, and when map is full you need to stop all operations , close it and resize i, then reopen.

krisskross commented 8 years ago

No, memory/file will not be allocated upfront. Set a high value, close to max size of disk, and the OS will handle the rest. No problem.

krisskross commented 8 years ago

With Linux that is. I think Windows eagerly allocate unused space.

kk00ss commented 8 years ago

Thanks, that's awesome.

mauricioscastro commented 8 years ago

on this subject:

are you thinking about some sort of object pooling mechanism?

on another subject:

I am playing around, doing some stupid tests, with Bookkeeper and lmdbjni here https://github.com/mauricioscastro/rlmdb (for replicating lmdb with java over yours lmdbjni) after having used it a lot in basex-lmdb https://github.com/mauricioscastro/basex-lmdb.

now that I started thinking seriously about it and once I use lmdbjni all over the place in basex-lmdb I started by extending Env class for initial replicated environment setup, etc, but now that I need to extend other stuff, but I face package protection issues for Database and Transaction, thoughts? create a lmdbjni branch for me?

on Mr's Chu's character:

good to know other's participate on this feeling of mine where I see him as a difficult person to deal with, I never needed to approach him, but from the replies I read and the presentations I saw I almost feel threatened, he he :)

Regards and again, thanks for lmdbjni!

On 11/27/2015 09:37 PM, Kristoffer Sjögren wrote:

Ok, let's not wake the bear ;-) We'll figure something out. Any PR's would be appreciated in the meantime.

— Reply to this email directly or view it on GitHub https://github.com/deephacks/lmdbjni/issues/12#issuecomment-160227188.

Maurício Santiago de Castro

mauriciosantiagodecastro

mauricioscastro@hotmail.com mailto:mauricioscastro@hotmail.com http://br.linkedin.com/pub/mauricio-castro/17/275/a5b http://www.delicious.com/mscastro http://twitter.com/#%21/mauricioscastro mauriciosantiagodecastro@facebook http://facebook.com/mauriciosantiagodecastro https://mscastro.startssl.com/

kk00ss commented 8 years ago

Why not use Apache Ignite or Hazelcast for replication ? 30 Ноя 2015 г. 16:17 пользователь "Mauricio Santiago de Castro" < notifications@github.com> написал:

on this subject:

are you thinking about some sort of object pooling mechanism?

on another subject:

I am playing around, doing some stupid tests, with Bookkeeper and lmdbjni here https://github.com/mauricioscastro/rlmdb (for replicating lmdb with java over yours lmdbjni) after having used it a lot in basex-lmdb https://github.com/mauricioscastro/basex-lmdb.

now that I started thinking seriously about it and once I use lmdbjni all over the place in basex-lmdb I started by extending Env class for initial replicated environment setup, etc, but now that I need to extend other stuff, but I face package protection issues for Database and Transaction, thoughts? create a lmdbjni branch for me?

on Mr's Chu's character:

good to know other's participate on this feeling of mine where I see him as a difficult person to deal with, I never needed to approach him, but from the replies I read and the presentations I saw I almost feel threatened, he he :)

Regards and again, thanks for lmdbjni!

On 11/27/2015 09:37 PM, Kristoffer Sjögren wrote:

Ok, let's not wake the bear ;-) We'll figure something out. Any PR's would be appreciated in the meantime.

— Reply to this email directly or view it on GitHub https://github.com/deephacks/lmdbjni/issues/12#issuecomment-160227188.

Maurício Santiago de Castro

mauriciosantiagodecastro

mauricioscastro@hotmail.com mailto:mauricioscastro@hotmail.com http://br.linkedin.com/pub/mauricio-castro/17/275/a5b http://www.delicious.com/mscastro http://twitter.com/#%21/mauricioscastro mauriciosantiagodecastro@facebook http://facebook.com/mauriciosantiagodecastro https://mscastro.startssl.com/

— Reply to this email directly or view it on GitHub https://github.com/deephacks/lmdbjni/issues/12#issuecomment-160642334.

mauricioscastro commented 8 years ago

Humm... overkill I thought initially about Hazlecast. Dunno about Ignite, but I am looking into it. A write through cache? hummm... I remember considering Infinispan for this...

Too high level I thought, Bookkeper being more raw would allow me more control over what's going on and perform better?

I will look at how embeddable Ignite would be.

Thanks.

On 11/30/2015 12:51 PM, kk00ss wrote:

Apache Ignite

Maurício Santiago de Castro

mauriciosantiagodecastro

mauricioscastro@hotmail.com mailto:mauricioscastro@hotmail.com http://br.linkedin.com/pub/mauricio-castro/17/275/a5b http://www.delicious.com/mscastro http://twitter.com/#%21/mauricioscastro mauriciosantiagodecastro@facebook http://facebook.com/mauriciosantiagodecastro https://mscastro.startssl.com/

krisskross commented 8 years ago

Yes, some kind of pooling. But since it messes with deleted pages and cause inconsistency I think the use case is quite limited. Maybe its possible to do something smart with the transaction id in LMDB 0.9.17.

Regarding the package structure, you could use delegation instead of inheritance.

krisskross commented 8 years ago

There's also copycat which is a Raft implementation. The author claims it passes the jepsen test.

krisskross commented 8 years ago

Won't fix.