Closed pmisik closed 1 month ago
I just checked and it seems that ntpd was started incorrectly on service3 but I don't see anything in the logs about there being any time issues when I did restart it the adjustment was microseconds.
I’m not sure if you use VM's for running worker machines. Since the time offset was significant (16497 seconds=04:34:57), I wonder if this is one of the issues with time synchronization you can have on the VM infrastructure (at least I've encountered them).
buildbot-worker
and starts executing commands with the wrong time from image/snapshot.buildbot-worker
is running, ntpd deamon/service runs and invokes time synchronization and shifts the time against the NTP server.For example, here https://buildbot.buildbot.net/#/builders/108/builds/2120 is an interesting situation where there is probably a time shifted twice.
Step 6 /tmp/bbvenv/bin/pip install -e master -e worker
according to the webUI of master took 7 seconds
but according to the log from the worker it took elapsedTime=-16486.171892
(negative duration)
Step 7 set -e
according to the webUI of master took 5 seconds
but according to the log from the worker it took elapsedTime=16497.727639
(positive duration)
Interesting. These workers are on a machine I boot up when I want faster test execution. Recently I migrated them to podman containers using gVisor container runtime. Probably gVisor doesn't fake syscalls well enough.
Just checking in this isn't an issue with time on the master? Sounds like it's not but I want to make sure if there's anything I need to do let me know.
@verm There's no issues with time on master. For any issues in p12-* workers the worker setup is the first suspect.
@p12tic okay great!
Now, it looks like p12-pd-? workers have run out of disk space for /home because errors like
error Error: ENOSPC: no space left on device, mkdir '/home/buildbot/... https://buildbot.buildbot.net/#/builders/126/builds/128 https://buildbot.buildbot.net/#/builders/122/builds/1120
WARNING: Building wheel for buildbot failed: [Errno 28] No space left on device: '/home/buildbot/.cache/pip/wheels/62' https://buildbot.buildbot.net/#/builders/127/builds/132
This applies at least to
This is no longer a problem, closing.
Hi
I guess there is Buildbot infrastructure instability caused by time synchronization on latent workers. On latent workers p12-pd-?? I'm seeing bizarre errors that seem to be time sync related. It looks as if the time synchronization occurred during the execution of steps. Reasons why I suspect time sync issue is that I randomly seeing these problems:
Negative elapsed time:
elapsedTime=-16489.823580
see line 38 in https://buildbot.buildbot.net/#/builders/100/builds/3298/steps/1/logs/stdioElapsed time much bigger than timeout 1200 seconds
elapsedTime=16497.727639
see https://buildbot.buildbot.net/#/builders/108/builds/2120/steps/7/logs/stdiotime related node assert
node[2513]: ../src/env.cc:1288:v8::Local<v8::Value> node::Environment::GetNow(): Assertion
(now) >= (timer_base())' failed.` https://buildbot.buildbot.net/#/builders/126/builds/92inconsistencies in the duration of the step reported in the master webUI and timeout in the workers
master webUI reports
coverage tests
run for 8:28 (508 seconds)command timed out: 1200 seconds without output running b'/tmp/bbvenv/bin/coverage run
https://buildbot.buildbot.net/#/builders/83/builds/4711set -e
run for 5 secondscommand timed out: 1200 seconds without output running
https://buildbot.buildbot.net/#/builders/108/builds/2120 ...@p12tic what do you think?