borgbase / vorta

Desktop Backup Client for Borg Backup
https://vorta.borgbase.com
GNU General Public License v3.0
1.98k stars 130 forks source link

Borg subprocess is not started #187

Closed JanSchnacki closed 5 years ago

JanSchnacki commented 5 years ago

Describe the bug Vorta hangs after some time. Or rather: The GUI still does "something" (responds to clicks up to a point) but backups won't run anymore.

To Reproduce I have no idea how to reproduce this. It's happening randomly. The log does not show anything that would lead to a possible cause

Desktop (please complete the following information): Kubuntu 18.10

Additional context Yesterday evening vorta started a backup at 23:00 which finished around a minute later. The "prune" run then never terminated. But I could not find a running process either. This was the log when I came back to my computer tonight:

2019-02-08 23:00:00,002 - apscheduler.executors.default - INFO - Running job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-08 23:00:00 CET)" (scheduled at 2019-02-08 23:00:00+01:00)
2019-02-08 23:00:00,010 - vorta.scheduler - INFO - Starting background backup for Default
2019-02-08 23:00:00,014 - vorta.notifications - DEBUG - success notifications suppressed
2019-02-08 23:00:00,027 - vorta.scheduler - INFO - Preparation for backup successful.
2019-02-08 23:00:00,057 - vorta.borg.borg_thread - INFO - Running command borg create --list --info --log-json --json --filter=AM -C lzma,6 --exclude-from /tmp/tmpnivdgac4 /NasSnake2/jan/Backups/borgbackup/snake2016::Snake2016-default-2019-02-08T23:00:00 /home/jan
2019-02-08 23:00:00,916 - vorta.borg.borg_thread - INFO - Creating archive at "/NasSnake2/jan/Backups/borgbackup/snake2016::Snake2016-default-2019-02-08T23:00:00"
2019-02-08 23:00:55,373 - vorta.notifications - DEBUG - success notifications suppressed
2019-02-08 23:00:55,374 - vorta.scheduler - INFO - Backup creation successful.
2019-02-08 23:00:55,380 - vorta.scheduler - INFO - Doing post-backup jobs for Default
2019-02-08 23:00:55,403 - vorta.borg.borg_thread - INFO - Running command borg prune --list --info --log-json --keep-hourly 0 --keep-daily 56 --keep-weekly 12 --keep-monthly -1 --keep-yearly 0 --prefix Snake2016-default- --keep-within 28d /NasSnake2/jan/Backups/borgbackup/snake2016
2019-02-09 00:00:00,001 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 00:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 01:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 01:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 02:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 02:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 03:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 03:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 04:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 04:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 05:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 05:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 06:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 06:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 07:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 07:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 08:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 08:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 09:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 09:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 10:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 10:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 11:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 11:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 12:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 12:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 13:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 13:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 14:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 14:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 15:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 15:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 16:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 16:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 17:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 17:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 18:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 18:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 19:00:00,426 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 19:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 20:00:00,425 - apscheduler.scheduler - WARNING - Execution of job "VortaScheduler.create_backup (trigger: cron[hour='*/1', minute='0'], next run at: 2019-02-09 20:00:00 CET)" skipped: maximum number of running instances reached (1)
2019-02-09 20:20:52,505 - vorta.i18n - DEBUG - Loading translation succeeded for ['de', 'de-DE', 'de-Latn-DE'].
2019-02-09 20:20:52,516 - apscheduler.scheduler - INFO - Scheduler started
2019-02-09 20:20:52,518 - apscheduler.scheduler - INFO - Added job "VortaScheduler.create_backup" to job store "default"
2019-02-09 20:20:52,518 - vorta.scheduler - DEBUG - New job for profile Default was added.
ThomasWaldmann commented 5 years ago

do you use a ssh: repo?

JanSchnacki commented 5 years ago

No, the repo is on a mounted CIFS share.

The repo is working fine (since several months) using a bash script running from a cron job.

ThomasWaldmann commented 5 years ago

hmm, the log looks like vorta is trying to create new instances (of the scheduler?) and can't.

JanSchnacki commented 5 years ago

I think that's because it thinks that the previous run is still active. The progressbar was still running when I came back to my computer.

JanSchnacki commented 5 years ago

Ok, today it happened again.

I found that the main vorta-process had a subprocess running. After ending that with a SIGQUIT the "list" process was listed with returncode "-3" in the protocol and subsequent backups could run again.

I'm not really fluent enough in python to know where to start looking for something like this. Last I asked there was no way to enable something like a debugging log, right?

m3nu commented 5 years ago

I see. Will do some testing on this too. Maybe adding some timeout could fix it. I've never seen a Borg process get stuck, except for an input prompt.

JanSchnacki commented 5 years ago

Well, there was no Borg process. Only vorta. So Borg itself must have terminated at some point but vorta didn't finish whatever it needed to do after that.

m3nu commented 5 years ago

OK. Then the termination isn't seen by subprocess, which is strange.

JanSchnacki commented 5 years ago

It just happened again last night. In the screenshot you can see, that there is no borg-process left.

This update just as "documentation'. I'll try to figure out how to debug python programs. Maybe I can find out what's happening...

image

m3nu commented 5 years ago

Must happen in the BorgThread class. Any particular command where it happens?

JanSchnacki commented 5 years ago

No, right now it's the prune command (yes, it's happening again). Mostly it's the list-command. But also during "check". As far as I can see (in the log), it never happened during "create".

JanSchnacki commented 5 years ago

Ok, not sure this will help you in any way but according to gdb it's spinning somewhere inside /usr/lib/python3.6/subprocess.py

I've attached to the running vorta sub-process and I get this. As I said, I currently have extremely basic knowledge of Python so I'll refrain from trying to interpret what might be happening here.

(gdb) py-bt
Traceback (most recent call first):
  <built-in method _localdummy_destroyed of weakref object at remote 0x7f15ffe41278>
  <built-in method fork_exec of module object at remote 0x7f160868f548>
  File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child
    restore_signals, start_new_session, preexec_fn)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.6/dist-packages/vorta/borg/borg_thread.py", line 146, in run
    env=self.env, cwd=self.cwd, preexec_fn=os.setsid)
(gdb) info threads
  Id   Target Id                                           Frame 
* 1    Thread 0x7f15c27fc700 (LWP 22513) "BorgPruneThread" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
(gdb) py-list
1270                                close_fds, tuple(sorted(map(int, fds_to_keep))),
1271                                cwd, env_list,
1272                                p2cread, p2cwrite, c2pread, c2pwrite,
1273                                errread, errwrite,
1274                                errpipe_read, errpipe_write,
>1275                                restore_signals, start_new_session, preexec_fn)
1276                        self._child_created = True
1277                    finally:
1278                        # be sure the FD is closed no matter what
1279                        os.close(errpipe_write)
1280    
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
103     in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) py-list
1270                                close_fds, tuple(sorted(map(int, fds_to_keep))),
1271                                cwd, env_list,
1272                                p2cread, p2cwrite, c2pread, c2pwrite,
1273                                errread, errwrite,
1274                                errpipe_read, errpipe_write,
>1275                                restore_signals, start_new_session, preexec_fn)
1276                        self._child_created = True
1277                    finally:
1278                        # be sure the FD is closed no matter what
1279                        os.close(errpipe_write)
1280    
(gdb) py-bt
Traceback (most recent call first):
  <built-in method _localdummy_destroyed of weakref object at remote 0x7f15ffe41278>
  <built-in method fork_exec of module object at remote 0x7f160868f548>
  File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child
    restore_signals, start_new_session, preexec_fn)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.6/dist-packages/vorta/borg/borg_thread.py", line 146, in run
    env=self.env, cwd=self.cwd, preexec_fn=os.setsid)
m3nu commented 5 years ago

I couldn't reproduce this on macOS or Ubuntu (VM), but your stacktrace was very useful nonetheless.

The culprit may be the preexec_fn:

The preexec_fn parameter is not safe to use in the presence of threads in your application. The child process could deadlock before exec is called. If you must use it, keep it trivial! Minimize the number of libraries you call into. (from the docs)

There seems to be a newer technique to group processes and terminate them at once (like we need when cancelling a running Borg process). I changed it in this PR: https://github.com/borgbase/vorta/pull/194

Can you try if this fixes the issue for you? The commands to pull my PR would be:

$ git checkout -b issue/187/hanging-subprocess master
$ git pull https://github.com/m3nu/vorta.git issue/187/hanging-subprocess

Then (as you probably know already)

$ pip uninstall vorta
$ pip install -e .
$ vorta --foreground
JanSchnacki commented 5 years ago

I didn't know, so thank's for telling me (why did I write asking here? scratchhead).

I'm running that now. Let's see what happens in the next hours/days...

JanSchnacki commented 5 years ago

Ok, this modified version has been running for 48 hours (hourly backups) without issue, before the error was intermittent but mostly it occured at least once every 24 hours. So I'll say already, that this is the most stable Vorta version I've had running so far.

I'll wait a few more days before marking this as "fixed". On the other hand: Your change definitely didn't introduce other issues here, so it's definitely an improvement.

m3nu commented 5 years ago

Glad to hear. Let's wait a few more days and merge it then.

m3nu commented 5 years ago

If you still didn't encounter a "hang", @JanSchnacki , I'll go ahead and merge it together with 2 other hotfixes we had this week.

JanSchnacki commented 5 years ago

Yes, @m3nu , go ahead. Everything seems to run as smooth as I can wish for, currently.