davidlatwe / montydb

Monty, Mongo tinified. MongoDB implemented in Python !
BSD 3-Clause "New" or "Revised" License
580 stars 29 forks source link

Multiple webserver workers has different db state for FlatFile #32

Open rewiaca opened 3 years ago

rewiaca commented 3 years ago

First of all, great lib for small prods and dev without installing and handling mongod or old-fashioned sqlite3!

Having a problem using supervisor with multiple workers, so basically running several instances of the same python script that connects to database, writing and reading it. The problem is that every worker has different version of database. My config:

from montydb import MontyClient
client = MontyClient("data")
client.cache_modified = 1
db = client.db

With cache_modified = 0 also the same problem. I think that montydb stores database is memory and consider FlatFile as a cache, so turning cache_modified to 1 would help, but not. Maybe the problem has another logic?

davidlatwe commented 3 years ago

Hey @rewiaca , thanks for trying !

I think the problem is that the FlatFile storage engine doesn't have any file lock so it's not multiple processes safe. Maybe you could try SQLite engine or LMDB ?

davidlatwe commented 3 years ago

Also, I don't think this line could actually set the config

client.cache_modified = 1
rewiaca commented 3 years ago

Hey @rewiaca , thanks for trying !

I think the problem is that the FlatFile storage engine doesn't have any file lock so it's not multiple processes safe. Maybe you could try SQLite engine or LMDB ?

Nice, LMDB work great! Thanks, just installed lmdb through pip and that's all. As I understand, each worker will not hold all database in memory but it will load up from file every request, what is the difference?

Also, I don't think this line could actually set the config

client.cache_modified = 1

How to I set config properly then? from montydb import cache_modified - doesn't work

davidlatwe commented 3 years ago

Glad that LMDB storage engine helps !

As I understand, each worker will not hold all database in memory but it will load up from file every request, what is the difference?

Well, the FlatFile storage engine is a really dead simple one storage engine which will re-write the whole file when the changed document count has reached cache_modified limit, it is not atomic at all. So if there is more than one worker is able to write their own work result without any lock/sync, race condition emerged.

How to I set config properly then?

Ah, I thought the README did provide those info, but it is not clear ! (README only says what config entry they have but does't say how to set them)

The config should be set by the set_storage method, as keyword arguments.

For Flatfile would be like this :

from montydb import set_storage, MontyClient
set_storage("/db/repo", storage="flatfile", cache_modified=1)
client = MontyClient("/db/repo")

For LMDB:

from montydb import set_storage, MontyClient
set_storage("/db/repo", storage="lightning", map_size=10485760)  # Maximum size database may grow to.
client = MontyClient("/db/repo")

And you should found a file named monty.storage.cfg has been saved in your db repository path, it would be /db/repo for above examples.