borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.22k stars 743 forks source link

Borg process never resumes after OS X sleep #4647

Open greenmoss opened 5 years ago

greenmoss commented 5 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Bug, or maybe documentation, or possibly just search fodder for other Mac users. I'm not sure which it is yet.

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.1.7

Operating system (distribution) and version.

OS X 10.13.6

Hardware / network configuration, and filesystems used.

Macbook Pro, backing up over wireless network. Backing up entire root partition, with significant excludes.

How much data is handled by borg?

Here's output from a successful borg create:

Time (start): Sun, 2019-06-23 13:23:22
Time (end):   Sun, 2019-06-23 13:33:35
Duration: 10 minutes 12.87 seconds
Number of files: 758372
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              119.70 GB            101.71 GB            210.13 MB
All archives:                2.04 TB              1.73 TB             85.48 GB

                       Unique chunks         Total chunks
Chunk index:                  615334             13695436
------------------------------------------------------------------------------

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create                         \
    --checkpoint-interval 300       \
    --verbose                       \
    --filter AME                    \
    --list                          \
    --stats                         \
    --show-rc                       \
    --compression lz4               \
    --exclude-caches                \ # excludes after this line, not listed here
    ${BORG_REPO}::'{now}'            \
    /Users                          \
    /opt                            \
    /usr/local                      \

Describe the problem you're observing.

The borg command usually launches and then completes without issue.

However sometimes I notice my backups on the backup host not refreshing even though my system should have backed up by then. In that case, I can do borg list on the backup host, and there is no active backup or lock. However on OS X I see borg create running.

Here's a transcript of me running commands, with some troubleshooting and my thoughts baked in as comments:

Headcrash:~ root# ps -ef | grep borg
    0 76428     1   0  7:25PM ??         0:00.03 /bin/sh /usr/local/bin/borg-backup.sh
    0 76764 76428   0  7:25PM ??         0:00.54 borg create --checkpoint-interval 300 (trimmed really long args list)
    0 76772 76764   0  7:25PM ??         1:00.61 borg create --checkpoint-interval 300 (trimmed really long args list)
    0 76873 76772   0  7:25PM ??         0:03.70 ssh borg-all@myborghost borg serve --umask=077 --info
    0  5757 76992   0 11:22PM ttys007    0:00.00 grep borg
Headcrash:~ root# borg-backup.sh # this is my wrapper script that runs the "borg create" above
Failed to create/acquire the lock /var/root/.cache/borg/1d94e028ce879299cf060a9e2778f2791d1422f6744e2f67f7ae0dd80dbae08c/lock.exclusive (timeout).
could not determine last increment name/date/time
Headcrash:~ root# cat /var/root/.cache/borg/1d94e028ce879299cf060a9e2778f2791d1422f6744e2f67f7ae0dd80dbae08c/lock.exclusive/headcrash.yoderhome.com\@154505282064271.76772-0
Headcrash:~ root# borg --version
borg 1.1.7
Headcrash:~ root# dtruss -p 76428
dtrace: system integrity protection is on, some features will not be available

SYSCALL(args)        = return
^C

Headcrash:~ root# dtruss -p 76428 -f
dtrace: system integrity protection is on, some features will not be available

    PID/THRD  SYSCALL(args)          = return
^C

Headcrash:~ root# dtruss -p 76764 -f
dtrace: system integrity protection is on, some features will not be available

    PID/THRD  SYSCALL(args)          = return
^C

Headcrash:~ root# dtruss -p 76428
dtrace: system integrity protection is on, some features will not be available

SYSCALL(args)        = return
^C

Headcrash:~ root# ps -ef | grep borg
  503 39397 39148   0 11:48PM ttys006    0:00.14 vim /usr/local/bin/borg-backup.sh
    0 50663 76992   0 11:56PM ttys007    0:00.00 grep borg

What the heck? After running dtruss, maybe it woke borg up again? At that point the SSH connection was dead, so maybe borg exited.

I am guessing my laptop went to sleep in the middle of a borg create, and this caused the problem.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I can not reproduce it on demand, but it happens periodically.

I have since updated to 1.1.10. If it doesn't happen again for a while, I'll comment again. Maybe the update fixed this behaviour.

Include any warning/errors/backtraces from the system logs

See above

ThomasWaldmann commented 5 years ago
Failed to create/acquire the lock /var/root/.cache/borg/1d94...e08c/lock.exclusive (timeout).

This is likely because a 2nd borg invocation tried to get the lock while a 1st one was still "running" (or at least the process still there).

Maybe you could improve this by some ssh configuration that more quickly detects broken down connections and also should lead to borg process termination, check the FAQ:

https://borgbackup.readthedocs.io/en/stable/faq.html#why-am-i-seeing-idle-borg-serve-processes-on-the-repo-server

ThomasWaldmann commented 5 years ago

BTW, you do not need to give ${BORG_REPO}, borg will automatically read that value.

greenmoss commented 5 years ago

I think this is happening because OS X is trying to keep old/disconnected SSH sessions open. I am going to try forcing the SSH client to disconnect, instead of indefinitely retrying.

I'm setting the following in my root SSH config:

ServerAliveInterval 60
ServerAliveCountMax 10

If this works, I will report back here.

EDIT it's been a few months so I forgot you mentioned the SSH connection. The docs are good from the server side. If this works, it means I also need to configure SSH to time out from the client side.

ThomasWaldmann commented 1 month ago

Would be interesting to have an update here.

greenmoss commented 1 month ago

Hi thanks for checking on this old ticket I totally forgot about it!

This hasn't happened since I posted the last update. Current Mac OS version 14.6.1.

I still have the settings mentioned above:

# cat /var/root/.ssh/config
ServerAliveInterval 60
ServerAliveCountMax 10

Maybe this can be closed? I leave that up to you.

ThomasWaldmann commented 1 month ago

Thanks for the update, I guess it is basically the same problem as documented there:

So, guess we only could mention that it also might help for OS sleep issues.