Seeing some extreme outliers in SQLite wait times to save and load objects.

OllieJones / sqlite-object-cache

A WordPress persistent object cache for the rest of us.

GNU General Public License v2.0

24 stars 4 forks source link

Seeing some extreme outliers in SQLite wait times to save and load objects. #10

Closed OllieJones closed 1 year ago

OllieJones commented 1 year ago

Statistics are showing, on Greengeeks, some extreme outliers in the times to save and load cache objects. (Many hundreds of milliseconds, compared to median values of hundreds of milliseconds.

So, increase the timeout from 500ms to 5s.

Switch the journaling mode to WAL (from MEMORY).

Report statistics on p1 and p99 (the one-percentile and 99th-percentile) times along with median, p5, and p95, to try to get a handle on whether these large times are basically one-off problems or recurrent.

OllieJones commented 1 year ago

The outliers are extreme. Some measurements show only a tiny number of outlier timings among thousands. Some "max" times are much bigger than the "p99" (99th percentile) times

I don't understand why. I've confirmed all the file systems involved are on locally attached disks, so we don't have some sort of nfs / smb / cifs type of access delay.

Without knowledge of why these delays happen and some way to predict them, there's nothing much to do except increase the timeout. I set it to 5sec.

OllieJones commented 1 year ago

It's possible some of the timeouts and slowness are nfs / cifs related. https://wordpress.org/support/topic/uncaught-exception-unable-to-execute-statement-database-is-locked/#post-16434734

I wonder if there's a way to detect a network-attached file system via stat? If so, could warn user.

OllieJones commented 1 year ago

I think some of this may have been due to the use of VACUUM blocking access. That's gone from v1.3.2.

OllieJones commented 1 year ago

Haven't seen this recur since realizing that VACUUM sucks and fixing the code.