Thomas--F / BotTracker

BotTracker-Plugin for Piwik
GNU General Public License v3.0
32 stars 14 forks source link

Deadlocks on updating bot_db.botCount #54

Open rob006 opened 7 years ago

rob006 commented 7 years ago

On every processing I have multiple errors like:

[11-Feb-2017 18:11:16 UTC] Error in Piwik (tracker): Error query: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction In query: UPDATE `piwik_bot_db`                          SET botCount = botCount + 1                           , botLastVisit = ?                          WHERE botId = ? Parameters: array (   0 => '2017-02-11 18:11:16',   1 => '252', )

Any chance to some optimization in this matter? Right now when one bot visit my site 2k times, I have 2k update queries, but it actually could be only one query which increase botCount by sum of visits and botLastVisit by date of last visit.

Thomas--F commented 7 years ago

Do you use realtime-visits or do you import the server-log? During what time period did you get the 2k hits? Was that really a bot or a DDOS-attack? ;-)

rob006 commented 7 years ago

I process server logs from last 24 hours. I have about 38k entries in log file, and most of them are generated by bots (it is dead site with lots of content). 2k is only an example - the point is that you spamming DB with lots of update queries instead counting this on PHP level and use one query to update value in database.

Thomas--F commented 7 years ago

Sorry, but that's the way, piwik works. I don't have a variable, where I can remember the last visit and don't know if the current hit is the last one so I must then do the update. Piwik is designed as real time application. Every hit is treated seperately.

The update should be no problem. The search before is more a thread because the correct bot is seached by a like. The performance is strongly affected by the number of bot-entries in yout table.

rob006 commented 7 years ago

Sorry, but that's the way, piwik works. I don't have a variable, where I can remember the last visit and don't know if the current hit is the last one so I must then do the update. Piwik is designed as real time application. Every hit is treated seperately.

You could use object property as counter and make query in __desctruct(). There could be also some hook in Piwik that allow trigger some actions at the end of processing batch of requests.

The update should be no problem. The search before is more a thread because the correct bot is seached by a like. The performance is strongly affected by the number of bot-entries in yout table.

I strongly disagree. I have about 200 bots in my table - this is practically nothing, MySQL could load whole table in memory and search it pretty quick without involving hard drive (IO operations are a bottleneck in most cases). When you make an update/insert, MySQL have much more work to do (checking constraints, recalculating indexes, cache revalidation, etc.) and at the end results must be saved to a hard drive (which is slow). Inserting/updating data is almost always slower than simple selects.

MySQL also has query cache, so even if you have 1k identical select queries, MySQL calculate result only once and 999 times use data from cache, which increase performance a lot. But MySQL revalidates cache every time when you update data in one of the tables that are used in cached query. So you practically killing this feature, because after each select (which creates cache) you make an update (which prunes this cache). So slow update queries are not the only problem in this case - you make select queries slower too.

Thomas--F commented 7 years ago

You are right, 200 rows is nothing for a modern database engine. But 38k updates should be no problem too, even on a distributed enviroment. The only case that can cause the deadlock is the usage of (massive) parallel processing. If the log is processed sequential, there can't be a deadlock because you need two concurrent update processes for a deadlock.

If you can find a way to improve performance - feel free to add the code. Help is always welcome.

rob006 commented 7 years ago

The only case that can cause the deadlock is the usage of (massive) parallel processing.

I use 2 parallel processes. It gives me about 140 errors on processing log file that have only 6 MB. I can't really imagine how I could import logs from bigger sites, which have over 1 GB of logs per day. You may say that 38k update queries should be no problem for DBMS, but for me these 38k queries are unnecessary and they are creating real problem - this plugin just makes log processing slow and unstable.

Unfortunately, I don't have enough time for testing solution that I proposed. I'm now just disabling parallel processing on one Piwik instance and completely disable plugin on another installation (I need parallel processes on this one).

Thomas--F commented 7 years ago

I normally work with much bigger DBMS (DB2 on zOS) an I would expect that when 2 threads want to update the same row, the first one gets the lock, makes the update, release the lock and then the second thread is on the run. It seems, MySQL isn't that smart. I'm not using the log-import and I don't have the time to find a smart solution. I will not try to use some global count-variables because I don't know hooks or infos that tell me if the plugin is called during live-tracking or during log-import. Using count-variables could screw the whole live-tracking.

And it seems not a problem for many users. If there are others that encounter deadlocks or want to help to find a solution, please make a comment. I will left this issue open for discussions

rob006 commented 7 years ago

Piwik uses transactions on bulk processing, so lock is released after whole transaction is committed. When one process locks one row in bot_db table, and second process locks another row in this table, we have deadlock because each of processes wait for another (each process have one lock and wait for release lock by other process to finish transaction).

ccandreva commented 4 years ago

FWIW, I am running into this issue doing log file processing now also. However it does confirm that the BotTracker plugin works for importing logs !