Open jeking3 opened 5 years ago
Please, could you try to see if this is xtime specific?
Good news - I completed the test program to isolate the issue and I can prove it is a problem in cygwin:
Jim@pulsar /cygdrive/c/users/jim/desktop
$ while "./timed_wait_short.exe"; do :; done
begin: 520424s 785831800n
target: 520424s 835831800n
end: 520424s 836902100n
ok: true
begin: 520424s 860788400n
target: 520424s 910788400n
end: 520424s 911174300n
ok: true
begin: 520424s 937973900n
target: 520424s 987973900n
end: 520424s 987900300n
ok: false
Jim@pulsar /cygdrive/c/users/jim/desktop
$ while "./timed_wait_short.exe"; do :; done
begin: 520427s 138223600n
target: 520427s 188223600n
end: 520427s 188227700n
ok: true
begin: 520427s 213095200n
target: 520427s 263095200n
end: 520427s 263004000n
ok: false
Here's the source code:
$ cat timed_wait_short.c
#include <assert.h>
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <time.h>
int main(void)
{
pthread_cond_t cv;
pthread_condattr_t attr;
pthread_mutex_t m;
struct timespec ts_begin;
struct timespec ts_target;
struct timespec ts_end;
assert(!pthread_condattr_init(&attr));
pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
assert(!pthread_mutex_init(&m, NULL));
assert(!pthread_cond_init(&cv, &attr));
assert(!pthread_mutex_lock(&m));
assert(!clock_gettime(CLOCK_MONOTONIC, &ts_begin));
ts_target = ts_begin;
ts_target.tv_nsec += 50000000; // add 50ms
if (ts_target.tv_nsec >= 1000000000)
{
++ts_target.tv_sec;
ts_target.tv_nsec -= 1000000000;
}
assert(ETIMEDOUT == pthread_cond_timedwait(&cv, &m, &ts_target));
assert(!clock_gettime(CLOCK_MONOTONIC, &ts_end));
int ok = ts_end.tv_sec > ts_target.tv_sec ||
(ts_end.tv_sec == ts_target.tv_sec && ts_end.tv_nsec >= ts_target.tv_nsec);
printf(" begin: %10us %10un\n", ts_begin.tv_sec, ts_begin.tv_nsec);
printf("target: %10us %10un\n", ts_target.tv_sec, ts_target.tv_nsec);
printf(" end: %10us %10un\n", ts_end.tv_sec, ts_end.tv_nsec);
printf(" ok: %s\n", ok ? "true" : "false");
return !ok; // unix error code 0 means ok
}
This is only an issue when you set the condition variable's clock to monotonic. It does not happen with the realtime clock. I'll submit another issue to the cygwin mailing list (is there really no bug database, and just a mailing list?). As for the tests, we can disable these two tests that are failing under cygwin until the underlying platform is fixed.
What about disabling the monotonic clock in cygwin?
Looking through the sources for cygwin, it looks like all the other timed calls use the real clock, and there was a commit in Feb 2018 that changed the code:
On line 2564 a calculation is made, but then later on at line 2574 (the case for the monotonic clock) it looks like the subtraction there is subject to rounding errors. That's likely the root cause.
We have a couple choices we could use to handle this:
On cygwin only, for condition_variable::timed_wait, always return the result of do_timed_wait (which calls pthread_cond_timedwait, which is returning ETIMEDOUT). This is a workaround for the underlying issue until they fix the issue.
On cygwin only, disable these specific tests until they fix the issue.
Checking the clock after the call to pthread_cond_timedwait still seems strange to me. It would change the result from false to true if the clock was adjusted backwards, but it wouldn't change the result if the clock was adjusted forwards. I still think it would be better to just return the result of pthread_cond_timedwait so the behavior of clock adjustment is handled by the platform. That said, let me know what you are comfortable with.
I'll not change the meaning of wait_until :( Again, what about disabling the monotonic clock in cygwin?
Just for the condition variable handling, or across the board? This case seems limited to just the condition variable with monotonic timer. For now I'm modifying detail/config.hpp to not use the monotonic clock on cygwin.
I don't catch the question?
Go ahead for the change in the config file.
Setting as enhancement it is a cygwin.
This issue was confirmed upstream, see:
Upstream changes and snapshot available for testing:
https://cygwin.com/snapshots/x86/cygwin-diffs-20181126-20181129
In test_mutex there are some simple tests that lock a mutex, then call timed_wait on a condition variable and the mutex. Given no other code will notify the condition variable, the tests (test_mutex.cpp:52 and test_mutex.cpp:91) expect to result to be false. On cygwin the result is sometimes true:
https://ci.appveyor.com/project/jeking3/thread/builds/20524620/job/axu9rgvpm2h1djgm?fullLog=true#L1626
I was able to reproduce this locally in the debugger by modifying timed_wait such that I could place a breakpoint on the failure:
The
after
time is 4300us before the end ofts
which is the time point that should have passed. This likely means there is a rounding error somewhere, either in the time handling of boost.thread or down in cygwin.