cloudflare / roughtime

A secure clock-synchronization protocol for when rough is enough.
https://developers.cloudflare.com/time-services/roughtime/
Apache License 2.0
135 stars 29 forks source link

large difference in deltas from 1st and 2nd runs with alerter #1

Closed brettowe closed 6 years ago

brettowe commented 6 years ago

Looking into this code and idea I was trying out the alerter recipe and I'm seeing around a 250ms difference in 1st and 2nd runs of the program run in rapid succession below is output from 1 such run, not sure how much time is needed to wait for it to reset and give similar results, 10ish or less minutes I'm still not understanding code fully but I did notice in client.go line 323 says to "Reset the delay accumulator." but code on following line doesn't feel like a reset is happening, perhaps that's the issue?

chronus-pi ~ # ./alerter -rt go/src/github.com/cloudflare/roughtime/ecosystem.config 2018/09/21 client.go:272: Cloudflare-Roughtime: 2018-09-21 21:10:52.191 -0700 PDT ±1s (in 74ms) 2018/09/21 client.go:272: Google-Sandbox-Roughtime: 2018-09-21 21:10:52.304921 -0700 PDT ±1s (in 100ms) 2018/09/21 client.go:272: int08h-Roughtime: 2018-09-21 21:10:52.401288 -0700 PDT ±1s (in 111ms) 2018/09/21 alerter.go:60: delta: 148ms chronus-pi ~ # ./alerter -rt go/src/github.com/cloudflare/roughtime/ecosystem.config 2018/09/21 client.go:272: Cloudflare-Roughtime: 2018-09-21 21:10:53.711 -0700 PDT ±1s (in 73ms) 2018/09/21 client.go:272: Google-Sandbox-Roughtime: 2018-09-21 21:10:53.821151 -0700 PDT ±1s (in 92ms) 2018/09/21 client.go:272: int08h-Roughtime: 2018-09-21 21:10:53.907014 -0700 PDT ±1s (in 103ms) 2018/09/21 alerter.go:60: delta: -17ms

cjpatton commented 6 years ago

Hi @brettowe, thanks for the feedback! The comment you refer to is outdated. (I'm removing it now.)

What's happening here is that in order to accurately measure the difference between our clock's time (t0) and each server's reported time, we need to account for the time delay accumulated so far. To do so, we record the client's observed network delay for each Roughtime query. (See this line.)

I've also observed this issue. What I notice is that every time you run roughtime.DoFromFile() after not running it for a while (maybe a minute or so?), the first run is off by a bit, but every run thereafter is nicely synced. It occurs to me that this is the result of latency associated with loading the configuration file that isn't currently measured. Loading the file from disk costs a few milliseconds, but subsequent runs load the file from in-memory cache.

This is (hopefully) resolved by https://github.com/cloudflare/roughtime/commit/ae6c5ae113d8bef72aaaad007eb75032bca03553. Please let me know if the problem persists and I'll re-open the issue. Thanks again!

brettowe commented 6 years ago

confirmed works now without massive variance between runs