donjajo / php-jsondb

A PHP Class that reads JSON file as a database. Use for sample DBs
MIT License
204 stars 56 forks source link

Concurrency? #40

Open raygit83 opened 3 months ago

raygit83 commented 3 months ago

So far I don't see any measures to handle concurrent accesses from different clients to the same db file. Is this deliberate? Do you plan adding locked r/w access?

donjajo commented 3 months ago

Hello @raygit83

Yes, there should be support for concurrency, but right now, I don't have a full idea of the implementation. My first thought is, we can't block the entire file for a r/w operation. I am thinking of implementing locking on each row level but I don't have a full thought on this yet. I will think about it further.

If you have an idea, please share.

raygit83 commented 3 months ago

Hi @donjajo,

Thanks for your response. And yes, locking the file during r/w accesses isn't enough to ensure integrity, consider the following example:

  1. Two clients perform a read access to the db.
  2. Both clients perform an arbitrary edit on the same data.
  3. Both client commit their changes at different times.

Even with locked file access, one of the edits made in 2. will be lost because the changes of one of the clients won't be reflected on the other client's system depending on who gets to perform the write access last.

Here are some thoughts:

The very popular SleekDB solves this by creating a file for every single entry in the table and the maintaing a cache/index to allow for somewhat performant queries. One issue with that is that you might hit the max. amount of allowed files permitted on your hoster's filesystem rather quickly.

Another option would reading back the db, parsing and merging the data in a locked scope before every write access and read access would be locked against that. This would be safe, but it would quickly cause a performance hit depending on the size and complexity of your data.

Perhaps a mixture of both of these options, were the entries are stored in chunks of say up to 1024 entries per file that is locked in the discussed manner or something. So for a query one would have to iterate these files or they could be organized in some structured manner to speed things up (?).

Best, Ray

donjajo commented 2 months ago

Thank you for your insights!

What about creating something like mutex? Blocking each session till one other has finished r/w operation? Do you think this would have performance issue? Theoritically, I can use named pipe for this. My only concern is if there is so many traffic, each session waiting for another. Of course, this will be very fast.

raygit83 commented 2 months ago

Hi James,

Yes, a mutex will be required. It seems that PHP has a built-in mutex class these days, see here: https://www.php.net/manual/en/class.syncmutex.php , but I don't know how reliable it is. I have used mkdir() to implement a mutex in PHP in the past, because it's atomic.

Anyway, locking the entire session would be safe, but rather brutal imo, so I'd go for something more fine grained as discussed above. I'll try to sketch something next week.

Best, Ray

raygit83 commented 2 months ago

Thank you for your insights!

What about creating something like mutex? Blocking each session till one other has finished r/w operation? Do you think this would have performance issue? Theoritically, I can use named pipe for this. My only concern is if there is so many traffic, each session waiting for another. Of course, this will be very fast.

Hi James,

Here's an example implementation outlining my idea: https://github.com/raygit83/json_db.class/tree/main

The database is split into "pages" with e.g. 2048 entries at max. There's auto incremented id by which items can be accessed and which is used to determine the page in which an entry is stored. There's also a find() function which will have to iterate the pages and use PHP's array_filter() function to query specific items. One could use an index / cache mechanism to speed this up, but it should be possible to find a good balance in terms of performance by setting an appropriate value for ITEMSPERPAGE.

Locking is performed on the page level. This will allow for 100% solid concurrency.

Perhaps a similar backend can be worked into your class, which has a more sophisticated interface for accessing and querying records.

Best, Ray