Open edsantiago opened 4 months ago
Hi @edsantiago,
I think increasing the expected time in |035| podman logs - --until --follow journald
is not a good idea. Since the time can vary with the actual load on the machine or vary due to the scheduler when running lots of parallel runs, the test should check if the command gets 3s of logs, not how long it takes.
For the [035] podman logs - multi k8s-file
test, I would say that the first container did not finish the job and was put to sleep due to CI machine load. Probably should wait for both containers to finish their work before reading the logs.
In the test [035] podman logs - --since --follow journald
I would say that when running in parallel the journald is used by multiple containers, so it will be necessary to increase the timeout time to give the container more time to write to the journald, and also perform a check, end of journald content.
I think increasing the expected time in |035| podman logs - --until --follow journald is not a good idea. Since the time can vary with the actual load on the machine or vary due to the scheduler when running lots of parallel runs, the test should check if the command gets 3s of logs, not how long it takes.
Keep in mind the same race exists for the ctr process so there is no way of knowing what 3s of logs are because depending on scheduling the ctr process might have only written a few lines not 30 with the sleep 0.1 interval so it is impossible to know if the writer side didn't write fast enough or if the reader looses messages. As such the process should exit after 3s match is simple and easy in theory but of course also has the timing problem. And we also want to check the the logs process actually exits in time.
I am however not sure how the rounding works with the built in $SECONDS
in bash, maybe it would be safer to take the time before and after in ms and compare that?
@Luap99 I tested $SECONDS
and time in ms. I found that $SECONDS
are not accurate because the time is rounded to whole seconds. So if the t0
is 1856ms
, but the $SECONDS
is still 1
, this inaccuracy causes the command to be at most only about 150ms late which is less variation than I observed between test runs (the time was around 3150-3650ms). At higher workloads, this delay can be larger.
A friendly reminder that this issue had no activity for 30 days.
Hodgepodge of parallel-system-test flakes that don't seem to fit anywhere else. I think most of these just need something like