Open GoogleCodeExporter opened 9 years ago
Yeah that does happen.
The primary motivator for doing that was for people *with* ntpd though. On boot
their clock would be weird, and memcached and ntpd would both start at similar
times. Then ntpd would do a large initial correction and the daemon would be
goofed until restart.
This happened pretty often, and switching it to use ntpd also requires that you
make it start after the clock is corrected.
Going to leave this open and see if we can do the time slew thing.
Original comment by dorma...@rydia.net
on 21 Dec 2014 at 1:46
Hmm, doesn't quite sound that a fix will make it into one of the next releases
:(
Also I'm not sure if smooth correction can even be sanely implemented at an
application
level without hackery (duplicating much of ntpd or reading ntpd's drift-file to
get the
current frequency offset).
From ntp's documentation (cf. http://doc.ntp.org/4.1.1/miscopt.htm ):
> driftfile driftfile
> This command specifies the name of the file used to record the frequency
offset of
> the local clock oscillator. If the file exists, it is read at startup in
order to
> set the initial frequency offset and then updated once per hour with the
current
> frequency offset computed by the daemon. If the file does not exist or
this command
> is not given, the initial frequency offset is assumed zero. In this case,
it may
> take some hours for the frequency to stabilize and the residual timing
errors to
> subside.
>
> The file format consists of a single line containing a single floating
point number,
> which records the frequency offset measured in parts-per-million (PPM).
[...]
What about this idea:
- keep the internal counter (to avoid syscall overhead)
- add a new memcached parameter to enable the following behavior (default off)
- => sync the internal counter every minute (or so) to the system time
- spew out an error message if there's a big difference (say 30s?)
- document the parameter (list preconditions and possible consequences if not
met)
Some thoughts and conclusions:
- many (especially distributed) systems depend on smooth time correction anyway
- memcached is about putting unneeded RAM on remote machines to good use
A distributed application that stores keys with validity >30s may assume all of the
following at once:
- local system time == local memcached time
- local system time == remote memcached time
- remote system time == local memcached time
- remote system time == remote memcached time
- jumping system time is not the application's problem. It's an ops problem
- fix at the system level instead of the application level (don't fix what
isn't broken)
- probably less code to touch
Maybe it does make sense to introduce internal smoothing for non-Linux or
existing users
with broken time syncronisation that would be caught by surprise by an altered
default
behavior of memcached.
Original comment by chr.eg...@gmail.com
on 7 Jan 2015 at 4:09
> - spew out an error message if there's a big difference (say 30s?)
EDIT: A warning message would be better, i.e. let the internal counter jump but
keep sync running.
Original comment by chr.eg...@gmail.com
on 7 Jan 2015 at 4:12
It could take a little while to get to it... you can see there're a lot of open
bugs and pull requests :/ Sorry.
Original comment by dorma...@rydia.net
on 7 Jan 2015 at 9:05
D*mn, I wrote this bug report under wrong assumptions. I've now realized what
the real bug is and of course it's in the proprietary software that we're using.
It's wrongly sending absolute time stamps in one specific case where only a
short relative value of 30s must be used (i.e. well below the 2592000s or 30
days limit). Initially spotting that memcached and the system time were skewed
between 5 and 10 minutes on all of our machines set me on a wrong track from
the beginning and I somehow missed the importance of relative vs. absolute.
It really only affects the few users that need time spans >30d, which we do not
at this time, and the impact is a lot less then. Sorry for the inconvenience.
Original comment by chr.eg...@gmail.com
on 9 Jan 2015 at 10:38
Original issue reported on code.google.com by
chr.eg...@gmail.com
on 18 Dec 2014 at 12:01