freebsd / kyua

Testing framework for infrastructure software
BSD 3-Clause "New" or "Revised" License
147 stars 42 forks source link

negative datetime::delta exception #155

Open brooksdavis opened 8 years ago

brooksdavis commented 8 years ago

I've started trying to run the FreeBSD test suite on mips64 under qemu. The run inevitably fails (after an indeterminate number of tests) with:

kyua: E: Cannot subtract 1454883115560690us from 1454883116245587us as it would result in a negative datetime::delta, which are not supported.

The check in datetime::timestamp::operator- appears to be correct, but the exception string has the variables swapped.

I'm using the kyua-0.12,2 package. The version of qemu is the amd64 version of emulators/qemu-cheri. I can provide images if that would be useful. I'll try to get this tested on real hardware as well to try and rule out a qemu bug.

staceyson commented 8 years ago

On an edgerouter lite (running FreeBSD 11.0-CURRENT (ERL) #0 4cf4c54(master)) I am seeing a lot of the tests fail but nothing related to having a negative time delta (suggesting time travel). Some of the test that fail include:

bin/pkill/pgrep-j_test:main  ->  failed: 3 of 3 tests failed  [2.513s]

bin/pkill/pkill-j_test:main  ->  failed: 1 of 3 tests failed  [4.145s]

lib/atf/libatf-c++/atf_c++_test:include  ->  failed: Header check failed; atf-c++.hpp is not self-contained  [1.011s]

lib/atf/libatf-c++/build_test:include  ->  failed: Header check failed; atf-c++/build.hpp is not self-contained  [0.622s]

sbin/growfs/legacy_test:main  ->  broken: TAP test program yielded invalid data: Load of '/tmp/kyua.lDzHmM/510/stdout.txt' failed: Reported plan differs from actual executed tests  [4.141s]

sbin/mdconfig/legacy_test:main  ->  failed: 1 of 1 tests failed  [4.961s]

sys/kern/unix_seqpacket_test:pipe_128k_8k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:347: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.072s]
sys/kern/unix_seqpacket_test:pipe_8k_128k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:347: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.078s]
sys/kern/unix_seqpacket_test:pipe_8k_8k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:347: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.075s]
sys/kern/unix_seqpacket_test:pipe_simulator_128k_128k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:221: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.077s]
[...]

I may be running a rather old test suite, however. I haven't updated this edgerouter with a newer image in some time. Attached are the complete test suite output and results db file.
results.usr_tests.20160208-051013-160329.db.zip testsuite_output.txt

jmmv commented 8 years ago

Good catch on the error message. I'm fixing it but I first need to get the Travis setup back in working order so I can properly validate the fix... (Something has changed in the environment there that's causing builds to fail.)

Could you rerun with --loglevel=debug and then attach the log file from ~/.kyua/logs/ for the corresponding run? (Or use --logfile=/foo to specify where to save it.)

Thanks

brooksdavis commented 8 years ago

https://gist.github.com/brooksdavis/bf0977c69ebbac587e6e

jmmv commented 8 years ago

Bleh. The log file does not provide much insight on the problem. (I wonder why I didn't make that error just an assertion to trigger a core dump, which could be more useful here...)

Is the failure deterministic?

Have you been able to run the Kyua self-tests alone (those in /usr/local/tests/kyua/), without the rest of FreeBSD's test suite? Any failures reported by it would hopefully point quite well to the problem.

brooksdavis commented 8 years ago

It's not deterministic. Some number of tests run, but I've had failures occur between 30min and 3 hours into the test run. To be honest, I suspect a qemu bug since we can't trigger this anywhere else.

I've spotted a couple seemingly unrelated test failures in the kyua engine tests:

atf_test:test__body_only__crashes  ->  failed: Line 154: exp_result != test_result_handle->test_result() (model::test_result{type='broken', reason='Premature exit; test case received signal 6 (core dumped)'} != model::test_result{type='broken', reason='Premature exit; test case received signal 6'})  [2.989s]
scheduler_test:integration__stacktrace  ->  failed: Line 986: atf::utils::grep_file("attempting to gather stack trace", result_handle->stderr_file().str()) not met  [1.411s]

Some failures in utils, also probably unrelated (the core dump ones are probably test bugs due to us sysctl kern.coredump=0 in our default images due to very slow SDCards):

stacktrace_test:unlimit_core_size  ->  failed: Core not dumped as expected  [1.161s]
process/executor_test:integration__output_files_always_exist  ->  broken: Premature exit; test case received signal 14  [7.331s]
process/isolation_test:isolate_child__enable_core_dumps  ->  failed: Line 421: status.coredump() not met  [1.108s]
process/operations_test:terminate_self_with__termsig_and_core  ->  failed: Line 375: WCOREDUMP(status) not met  [0.875s]
process/status_test:integration__coredump  ->  failed: Line 206: coredump.coredump() not met  [0.810s]
signals/timer_test:multiprogram_reorder_next_activations  ->  failed: Line 215: exp_items != items ([2, 1, 4, 3] != [1, 2, 4, 3])  [1.158s]
jmmv commented 8 years ago

Well, the failure of signals/timer_test:multiprogram_reorder_next_activations is suspicious in this context, but the test itself could be flaky considering that it plays with time.

If I provide you with a patch plumbing additional logging for the failure you are experiencing, could you apply it and retest?

brooksdavis commented 8 years ago

I think either I or @staceyson could get it built for mips. I don't currently have a build environment, but do have a machine that I could set one up on.

staceyson commented 8 years ago

Point me to the patch for this and I can build a mips64 package.

brooksdavis commented 8 years ago

@staceyson: as a hack to see if there are any other issues, I might try replacing the exception in datetime::timestamp::operator- with returning 0 since I don't care all that much how long the test takes.

brooksdavis commented 8 years ago

On another note, I've done some testing on our FPGA platform and thus far haven't hit this issue, but have managed to wedge the hardware before completing any runs (I've run quite a lot of tests).

staceyson commented 8 years ago

@brooksdavis I created a version that gives abs() of the difference of the two timestamps. It is available at: https://people.freebsd.org/~sson/mips/packages/ You can install the new package or just kyua into /usr/local/bin.

jmmv commented 8 years ago

Disabling the check seems fair, though I can't remember operator- is only used to compute the test's length. It could also be being used to compute test deadlines.

I also find the failure in signals/timer_test:multiprogram_reorder_next_activations to be suspicious, but that test could certainly be flaky.

Would be good to have the list of failures with the hacked binary.

brooksdavis commented 8 years ago

With @staceyson's package I'm still seeing seeing these exceptions. I've made very sure that I've installed that one and not another one, but something isn't working. The SHA256 of the package is 633f49f98fac6f4c74f4d48b18afcb0d644df0d31709b864306773073b0dff87 and the kyua executable is be91bd0ab394a4a4485c5c19fc396a4f7729a3fabf92cc9c25ecd0ed5dc578e8

brooksdavis commented 8 years ago

I'm no longer able to trigger this so hopefully it's all fixed where ever the bug was.

brooksdavis commented 7 years ago

I'm seeing this again with the 0.13,3 package on FreeBSD.

date/format_string_test:C_test  ->  kyua: E: Cannot subtract 1476919548082338us from 1476919547556756us as it would result in a negative datetime::delta, which are not supported.
brooksdavis commented 7 years ago

After further reflection, the core assumption this exception is asserting is false. gettimeofday() is not guaranteed to be monotonic. If you want something that is, clock_gettime(CLOCK_MONOTONIC, &ts); would be more appropriate. Otherwise, this code should handle time travel in some sensible manner.

brooksdavis commented 7 years ago

With this patch applied to the port I get a usable kyua for qemu. It's worth noting that I need the patch to use the results database as well since it contains the backwards timestamps. I chose to make "negative" deltas 1us rather than 0us in case there was code that assumed positive deltas.

+--- utils/datetime.cpp.orig
++++ utils/datetime.cpp
+@@ -590,11 +590,12 @@
+ datetime::delta
+ datetime::timestamp::operator-(const datetime::timestamp& other) const
+ {
+-    if ((*this) < other) {
+-        throw std::runtime_error(
+-            F("Cannot subtract %s from %s as it would result in a negative "
+-              "datetime::delta, which are not supported") % other % (*this));
+-    }
++    /*
++     * XXX-BD: gettimeofday isn't necessicarily monotonic so return the
++     * smallest non-zero delta if time went backwards.
++     */
++    if ((*this) < other)
++        return datetime::delta::from_microseconds(1);
+     return datetime::delta::from_microseconds(to_microseconds() -
+                                               other.to_microseconds());
+ }

I'd like to apply this change to the port.

jmmv commented 7 years ago

Thanks for tracking this down!

This change is insufficient, if only because it will break tests, and smells like a hack after all. Let me dig and see why "negative deltas" are claimed to be unsupported and how to make use of the monotonic clock, which would be much better to compute deltas.

brd commented 7 years ago

Hey @jmmv, any progress on this yet?

jmmv commented 7 years ago

So... I think the way we are tracking time in Kyua is broken. Time deltas are not going to work with the current timing mechanisms if the underlying time can change, and it can as you discovered.

The basic question is: what should happen if the current time changes while Kyua is running? Time deltas should work "as you expect", but what about timestamps? Should the "end time" of the run be allowed to be before the "start time", for example?

I had two different thoughts:

There of course is the possibility of ignoring this altogether and truncating the timestamps, as your change proposes, but I fear this will not resolve other problems that may exist.

Thoughts?

ngie-eign commented 7 years ago

The timer creation (setitimer...) should be replaced with timer_create (see #164 for more details).

jmmv commented 7 years ago

Unfortunately, macOS does not have timer_create et. al.

asomers commented 7 years ago

@jmmv any of your suggestions would work, and any of them would be much better than crashing, which is what Kyua does now. Personally, I don't think it's a big problem if Kyua thinks a test had a negative duration. It shouldn't happen very often, and the cause will be fairly obvious.

The best solution is obviously to store both RTC and monotonic times in each timestamp, though it's the most work. I wouldn't worry about database compatibility issues too much. Most people only care about the most recent run, and I would think that most people who do keep historical data keep it in the form of the junit.xml files.

brooksdavis commented 7 years ago

We're well past the first anniversary of this bug without progress so I plan to commit my hack which makes negative deltas positive 1ms as a patch to the port to allow us to start testing MIPS in ci.freebsd.org and to let me stop rolling my own package in the CheriBSD project.

jmmv commented 7 years ago

Please go ahead with that plan. I've only recently retaken Kyua-related work after finding a way to fit it into my schedule, and there are many things to go through... :-/

brooksdavis commented 7 years ago

Thanks!