Open abderrahim opened 2 years ago
This must have changed with buildbox-run
and buildbox-casd
.
It's worth investigating if suspending jobs works to suspend building, or the uploading / downloading of an artifact (i.e. if we properly send SIGTSTP
to the buildbox-casd
/buildbox-run
processes or not) - and consider whether we should be and if that is well supported.
I noticed something else today: terminating jobs works. I think this only happens in specific cases.
Maybe it's some race condition which results signal in being lost?
it definitely hangs horribly for trying to terminate ostree sources.
User interrupted with ^C
Choose one of the following options:
(c)ontinue - Continue queueing jobs as much as possible
(q)uit - Exit after all ongoing jobs complete
(t)erminate - Terminate any ongoing jobs and exit
Pressing ^C again will terminate jobs and exit
Choice: [continue]: t
Terminating all jobs at user request
[--:--:--][20887c2e][ main:bootstrap/build/base-sdk/image-x86_64.bst] STATUS Fetch terminating
^C[00:01:24][20887c2e][ fetch:bootstrap/build/base-sdk/image-x86_64.bst] BUG Fetch
An unhandled exception occured:
Traceback (most recent call last):
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_signals.py", line 113, in terminator
yield
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/utils.py", line 1398, in _call
output, _ = process.communicate(timeout=1)
File "/usr/lib64/python3.10/subprocess.py", line 1149, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib64/python3.10/subprocess.py", line 2026, in _communicate
self.wait(timeout=self._remaining_time(endtime))
File "/usr/lib64/python3.10/subprocess.py", line 1204, in wait
return self._wait(timeout=timeout)
File "/usr/lib64/python3.10/subprocess.py", line 1932, in _wait
time.sleep(delay)
buildstream._signals.TerminateException
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_common.py", line 441, in wrapper
ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1661, in wrapper
return fun(self, *args, **kwargs)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_common.py", line 444, in wrapper
return fun(self)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1703, in _parse_stat_file
with open_binary("%s/%s/stat" % (self._procfs_path, self.pid)) as f:
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_common.py", line 711, in open_binary
return open(fname, "rb", **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/11107/stat'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/__init__.py", line 361, in _init
self.create_time()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/__init__.py", line 717, in create_time
self._create_time = self._proc.create_time()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1661, in wrapper
return fun(self, *args, **kwargs)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1873, in create_time
ctime = float(self._parse_stat_file()['create_time'])
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/_pslinux.py", line 1668, in wrapper
raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: process no longer exists (pid=11107)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_scheduler/jobs/job.py", line 441, in child_action
result = self.child_process() # pylint: disable=assignment-from-no-return
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_scheduler/jobs/elementjob.py", line 92, in child_process
return self._action_cb(self._element)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_scheduler/queues/fetchqueue.py", line 77, in _fetch_not_original
element._fetch(fetch_original=False)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/element.py", line 2185, in _fetch
self.__sources.fetch()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_elementsources.py", line 225, in fetch
self.fetch_sources()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_elementsources.py", line 254, in fetch_sources
self._fetch_source(source)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_elementsources.py", line 435, in _fetch_source
source._fetch()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/source.py", line 802, in _fetch
self.__do_fetch()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/source.py", line 1289, in __do_fetch
new_source.fetch(**kwargs)
File "/home/will/projects/buildsystems/venvbuild/lib/python3.10/site-packages/bst_plugins_experimental/sources/ostree.py", line 159, in fetch
self.call(
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/plugin.py", line 732, in call
exit_code, _ = self.__call(
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/plugin.py", line 954, in __call
exit_code, output = utils._call(args, cwd=cwd, env=env, stdin=stdin, stdout=stdout, stderr=stderr)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/utils.py", line 1386, in _call
with _signals.suspendable(suspend_proc, resume_proc), _signals.terminator(kill_proc), subprocess.Popen(
File "/usr/lib64/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/_signals.py", line 115, in terminator
terminate_func()
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/utils.py", line 1374, in kill_proc
_kill_process_tree(process.pid)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/buildstream/utils.py", line 1293, in _kill_process_tree
proc = psutil.Process(pid)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/__init__.py", line 332, in __init__
self._init(pid)
File "/home/will/projects/buildsystems/venvbuild/lib64/python3.10/site-packages/psutil/__init__.py", line 373, in _init
raise NoSuchProcess(pid, msg='process PID not found')
psutil.NoSuchProcess: process PID not found (pid=11107)
[00:01:24][ ][ main:core activity ] WARNING Build Terminated
Pipeline Summary
Total: 1
Session: 1
Fetch Queue: processed 0, skipped 1, failed 0
Build Queue: processed 0, skipped 0, failed 0
This was just trying to build the bottom bootstrap in FD-SDK at the fetch stage
Which IIUC is not using buildbox..
As the boot strap is quite big this would have hung for a long time so i had to go and kill ostree with htop.
I noticed something about git
too (mentioned in the issue description) so it's definitely not buildbox related. This most likely something with the threaded scheduler.
Looks like process already exited when it was tried to be terminated? Oh yeah, right, the trace comes because it was externally killed.
I am having issues with terminating still, this bug still looks open.
Interrupt jobs doesn't work, I noticed this with fetch jobs before (where it seems that bst hanged but git is still downloading in the background) and today I noticed it with build jobs too. Here are the logs: notice that it says
STATUS Build terminating
for all four build jobs, yet they continue until they end withSUCCESS Caching artifact
.