internal clock drift from system time

GoogleCodeExporter commented 9 years ago

Memcached's internal monotonic clock keeps drifting away from system time.

Worse, different memcached machines in the same cluster will also drift away 
from each other. This causes variable key validity across the cluster depending 
on the machine the key will be stored on.

What steps will reproduce the problem?
Run memcached until its internal clock has significantly drifted into the 
future (say +60s). Store a key with 40s validity (using system time as base of 
course). 

What is the expected output? What do you see instead?
Expected: Key is stored and will be valid for 40s
Instead: Key is not stored and silently discarded. Caching fails.

My suggestion: Implement
A: smooth drift correction (like that's used with ntp) to not cause time jumps
B: a memcached parameter that forces memcached to use system time and warn 
users of the consequences if this is switched on

I could live with B, as we use ntp with smooth correction.

What version of the product are you using? On what operating system?
SLES11 + memcached 1.4.15

One of our memcached machines in the cluster has a rather big time drift 
compared to all others (which is corrected by ntp): +5s into the future per 
24h. This leads to a steady decrease in the hit rate on that memcache. 
Eventually the caching completely fails after just one week for a certain 
key-value that needs 30s validity which is stored from other hosts (that 
generate expiration dates using their ntp-synchronized system time)

As a result we currently could
A. restart all memcached instances across the cluster every 4 days (not 
feasible)
B. increase the validity period for that particular key (not possible)
C. Set up a single alternate memcached just for that key type (ugly and not 
failsafe) that's safe to be restarted
D. not use that server

We'd probably need to implement C, because A will insta-kill our app at 
particular times (cold cache = bad hit-rate). However, none of these solutions 
fixes the problem at its core.

# memcached-tool localhost:11211 stats| grep \ time | awk '{print $2}'; date +%s
1418899825
1418899819
#memcached-tool localhost:11211 stats | grep uptime
    uptime      106134

Original issue reported on code.google.com by chr.eg...@gmail.com on 18 Dec 2014 at 12:01

GoogleCodeExporter commented 9 years ago

Yeah that does happen.

The primary motivator for doing that was for people *with* ntpd though. On boot 
their clock would be weird, and memcached and ntpd would both start at similar 
times. Then ntpd would do a large initial correction and the daemon would be 
goofed until restart.

This happened pretty often, and switching it to use ntpd also requires that you 
make it start after the clock is corrected.

Going to leave this open and see if we can do the time slew thing.

Original comment by dorma...@rydia.net on 21 Dec 2014 at 1:46

GoogleCodeExporter commented 9 years ago

Hmm, doesn't quite sound that a fix will make it into one of the next releases 
:(

Also I'm not sure if smooth correction can even be sanely implemented at an 
application
level without hackery (duplicating much of ntpd or reading ntpd's drift-file to 
get the
current frequency offset).

From ntp's documentation (cf. http://doc.ntp.org/4.1.1/miscopt.htm ):

> driftfile driftfile
>    This command specifies the name of the file used to record the frequency 
offset of
>    the local clock oscillator. If the file exists, it is read at startup in 
order to
>    set the initial frequency offset and then updated once per hour with the 
current
>    frequency offset computed by the daemon. If the file does not exist or 
this command
>    is not given, the initial frequency offset is assumed zero. In this case, 
it may
>    take some hours for the frequency to stabilize and the residual timing 
errors to
>    subside.
>    
>    The file format consists of a single line containing a single floating 
point number,
>    which records the frequency offset measured in parts-per-million (PPM). 
[...]

What about this idea:
- keep the internal counter (to avoid syscall overhead)
- add a new memcached parameter to enable the following behavior (default off)
- => sync the internal counter every minute (or so) to the system time
- spew out an error message if there's a big difference (say 30s?)
- document the parameter (list preconditions and possible consequences if not 
met)

Some thoughts and conclusions:
- many (especially distributed) systems depend on smooth time correction anyway
- memcached is about putting unneeded RAM on remote machines to good use
  A distributed application that stores keys with validity >30s may assume all of the
  following at once:
  -  local system time ==  local memcached time
  -  local system time == remote memcached time
  - remote system time ==  local memcached time
  - remote system time == remote memcached time
- jumping system time is not the application's problem. It's an ops problem
- fix at the system level instead of the application level (don't fix what 
isn't broken)
- probably less code to touch

Maybe it does make sense to introduce internal smoothing for non-Linux or 
existing users
with broken time syncronisation that would be caught by surprise by an altered 
default
behavior of memcached.

Original comment by chr.eg...@gmail.com on 7 Jan 2015 at 4:09

GoogleCodeExporter commented 9 years ago

> - spew out an error message if there's a big difference (say 30s?)

EDIT: A warning message would be better, i.e. let the internal counter jump but 
keep sync running.

Original comment by chr.eg...@gmail.com on 7 Jan 2015 at 4:12

GoogleCodeExporter commented 9 years ago

It could take a little while to get to it... you can see there're a lot of open 
bugs and pull requests :/ Sorry.

Original comment by dorma...@rydia.net on 7 Jan 2015 at 9:05

GoogleCodeExporter commented 9 years ago

D*mn, I wrote this bug report under wrong assumptions. I've now realized what 
the real bug is and of course it's in the proprietary software that we're using.

It's wrongly sending absolute time stamps in one specific case where only a 
short relative value of 30s must be used (i.e. well below the 2592000s or 30 
days limit). Initially spotting that memcached and the system time were skewed 
between 5 and 10 minutes on all of our machines set me on a wrong track from 
the beginning and I somehow missed the importance of relative vs. absolute.

It really only affects the few users that need time spans >30d, which we do not 
at this time, and the impact is a lot less then. Sorry for the inconvenience.

Original comment by chr.eg...@gmail.com on 9 Jan 2015 at 10:38

espandy / memcached

internal clock drift from system time #389