codenotary / immudb

immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
https://immudb.io
Other
8.54k stars 340 forks source link

Limits on number of databases #510

Closed timexcession closed 3 years ago

timexcession commented 3 years ago

I'm attempting to create a large number of databases. However, when I've created about 128, then things just stop working.

I'm using 0.7.0, and going through the rest gateway.

Is there any known limit to the number of databases that can be supported? I'm hoping to create thousands for my usecase.

timexcession commented 3 years ago

I also found an undocumented constraint where database names are limited to 32 characters. This is slightly annoying, but I can work around it simply by using an MD5 hash as the database name.

jeroiraz commented 3 years ago

thanks @timexcession for reporting this issue. We are already working on overcoming these limitations.

SimoneLazzaris commented 3 years ago

Hi @timexcession ; I've made some tests and indeed the PR #511 addresses the problem. I've tested up to 4000 concurrent databases without issue using immudb-py. Also keep in mind that immudb uses 4 file descriptors per database (plus 16 always used), so if you plan to use many db you have to set the maximum number of open file per process to a higher value than the default 1024.

If you are using systemd, just add LimitNOFILE=65536 (or some other big value) in the [Service] section.

I've also checked that #511 allows to use database names up to 128 characters.

timexcession commented 3 years ago

That's grand - will these land with 0.9.0?

timexcession commented 3 years ago

Most of the time my databases will be closed - while I intend to create thousands, the number in use concurrently will be in the low hundreds

jeroiraz commented 3 years ago

Fix is already merged, so it'll be included in next release.

Note it's not currently possible to close individual databases.

timexcession commented 3 years ago

Right.... then there's a limit of 4095 databases per deployment, given LimitNOFILE=65536? Be good to document this somewhere.

jeroiraz commented 3 years ago

yes, @SimoneLazzaris is already working on the documentation as well.

timexcession commented 3 years ago

Great! Thanks for jumping on this so quickly, i really appreciate it. I like what you're doing here a lot.

vchaindz commented 3 years ago

@timexcession thanks for checking immudb and opening the issue! May I ask why you need to use so many different databases? I would be interested to learn about your use case. Thanks!

timexcession commented 3 years ago

I can't give full details of the application we're developing - it's under NDA. But in essence we have a set of customers who take notes, and they want to be able to demonstrate that their notebooks are not tampered with. My original plan for this had been simply to use git as a database, with a repo per 'notebook', but I'm changing that to immudb, and using resilient storage, provided I can prove that approach is production ready enough. They will escrow the root hash of the database outside our system; so it's important that it doesn't change, and also important that the merkle tree in each notebook is not dependent on others - hence multiple databases, one (or possibly two, for another reason I can't say) per notebook.

Is it the case that immudb keeps auditing each database that is open for consistency? Is the number of databases likely to cause a performance drag? Should I consider a means to dehydrate/rehydrate databases? I guess I could read the code, but here you are!

It would also be very cool if I could mark a database as 'closed', and that puts it into read only mode, with consistency checks only on demand, if auditing is otherwise ongoing. That would be a nice feature.

Our application is in node.js. I've built my own client that goes through the rest gateway, but would prefer to use yours, and remove the gateway; I need the sets APIs added first.

SimoneLazzaris commented 3 years ago

Right.... then there's a limit of 4095 databases per deployment, given LimitNOFILE=65536? Be good to document this somewhere.

Not exactly, immudb uses 4 file per database, plus 16 on top of that, so with 65536 file descriptor you should have 16380 databases. We're adding a note to the FAQ for that.

jeroiraz commented 3 years ago

@timexcession,

Each database is independent, so hashes won’t be affected by changes in other databases.

The current design assumes that all databases are opened and consistency checking runs for each of them (in case immudb is configured to do so). The next release will introduce several changes to our storage layer, making it possible to have a huge number of open databases without any concerns. But we’re aiming to provide the best approach for dealing with a practically unlimited number of databases, so keeping your use-case and thoughts as a reference.

Thanks

ps: node SDK already implements sets methods, please report anything of its repo so anyone using or collaborating with it will be notified.