Open jehangiriqbal opened 1 year ago
Hi, I want to contribute to this issue but failed to reproduce the issues.
I followed the steps provided except the kernel version of my ec2 instance is "5.10.179-171.711.amzn2.x86_64".
Can someone else please confirm the issues? Thanks.
Hi, thanks for looking into this issue
I feel that on your machine the behaviour of std::chrono::steady_clock::now is correct for consecutive calls. Please confirm it using the following code
repro.cpp
#include <iostream>
#include <iomanip>
#include <chrono>
int main()
{
const size_t REPEAT_COUNT = 10'000'000;
size_t count = 0;
size_t count_zeros = 0;
size_t total = 0;
std::cout << std::fixed << std::setprecision(9) << std::left;
for (auto i = 0; i < REPEAT_COUNT; ++i) {
// record start time
auto start = std::chrono::steady_clock::now();
// do no work
++count;
// record end time
auto end = std::chrono::steady_clock::now();
std::chrono::nanoseconds diff = end - start;
if (diff.count() == 0) {
++count_zeros;
} else {
total += diff.count();
}
}
std::cout << "steady_clock Total " << total << " ns - Average: "
<< (static_cast<double>(total) / count) << " ns\n";
std::cout << "Got 0ns between steady_clock::now() calls "
<< count_zeros << " out of " << count << "\n";
}
Command to execute
g++ -std=c++17 repro.cpp -o repro
./repro
Expected output
steady_clock Total 545533735 ns - Average: 54.553373500 ns
Got 0ns between steady_clock::now() calls 0 out of 10000000
Incorrect output
steady_clock Total 545533735 ns - Average: 54.553373500 ns
Got 0ns between steady_clock::now() calls <any number other than 0> out of 10000000
Also, can you please check the clock type on the machine using this command
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
Hi, I am now able to reproduce this issue on a EC2 I3.8XLARGE instance now. Can someone assign this problem to me?
Also encountered failure of:
[ RUN ] PerfContextTest.DBMutexLockCounter
db/perf_context_test.cc:615: Failure
Expected: (get_perf_context()->db_mutex_lock_nanos) > (0), actual: 0 vs 0
terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException'
what(): db/perf_context_test.cc:615: Failure
Expected: (get_perf_context()->db_mutex_lock_nanos) > (0), actual: 0 vs 0
Same setup as above.
Summary
I am creating this issue because I stumbled upon an issue in RocksDB release.
Some unit tests are failing (complete list below) on on Amazon Linux EC2 instance of type i3.8xlarge (the steps to reproduce the issue and sample docker file is attached). The tests are failing because they are using std::chrono::steady_clock to measure durations in nanoseconds. Consecutive calls to steady_clock::now() are returning the same time point frequently and failing the assertion. I would suggest to remove those tests or modify the tests to make sure that they can take up some time on fastest of machines and time reported is never zero.
I am currently running RocksDB make check on an AmazonEC2 instance with the following configuration
Expected behavior
All tests should pass
Actual behavior
I run into the failures for the following test cases, complete buildlog.txt is here
Steps to reproduce the behavior
The tests failures can be reproduced using the following docker file and using an Amazon Linux EC2 instance of type i3.8xlarge
Dockerfile
Investigation
On investigation, we found that there is a single cause of failures for all these tests. The tests are using std::chrono::steady_clock to measure durations in nanoseconds. Consecutive calls to steady_clock::now() are returning the same time point frequently and failing the assertion.
System configuration
System configuration
Platform: Amazon Linux Instance type: i3.8xlarge kernel: 5.10.176-157.645.amzn2.x86_64 g++ -v output: