borg mount throws "Transport endpoint is not connected"... yet another one

boelle commented 1 year ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

YES

Is this a BUG / ISSUE report or a QUESTION?

ISSUE, but sure i goof up somehow

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.3

Operating system (distribution) and version.

Openmediavault 6.4.5-1 (Shaitan)

Hardware / network configuration, and filesystems used.

HP Proliant N40l Microserver EXT4

How much data is handled by borg?

Not sure, but ~500 GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg mount -o versions ssh://root@100.121.165.103/srv/mergerfs/Data/homedir/Malcolm/Borgbackup/Backup-Bo-OMV_NOVA /srv/mergerfs/Data/Pool1-Backup

and

rsync -auvc --progress /srv/mergerfs/Data/Pool1-Backup/srv/mergerfs/Pool1/Public-Test/Hannah /srv/mergerfs/Data/Backup/srv/mergerfs/Pool1/Public-Test

Describe the problem you're observing.

sooner or later i get "Transport endpoint is not connected" when i use Rsync to restore the backup

i know that there is borg extract but there is no explanation on how to do a "merged" extract like i can do when mounting as "versions" and then rsync to a temp folder

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

just wait a bit and it will happen... sometimes it happens within sec's and sometime minutes, rarely an hour will pass

Include any warning/errors/backtraces from the system logs

Doing a mount with -f right now.... i will add as soon it fails again

rsync errors:

rsync: [sender] send_files failed to open "/srv/mergerfs/Data/Pool1-Backup/srv/mergerfs/Pool1/Public-Test/Hannah/Craftsy/Plus-Size Pant Fitting/01 - Class Preview.mp4/01 - Class Preview.00001.mp4": T                                      ransport endpoint is not connected (107)

and

rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

First fail

Data integrity error: Invalid segment entry header [segment 695, offset 469429576]: unpack requires a buffer of 41 bytes
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5168, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5099, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1361, in do_mount
    return self._do_mount(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 183, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1371, in _do_mount
    operations.mount(args.mountpoint, args.options, args.foreground)
  File "/usr/lib/python3/dist-packages/borg/fuse.py", line 565, in mount
    signal = fuse_main()
  File "/usr/lib/python3/dist-packages/borg/fuse.py", line 55, in fuse_main
    return llfuse.main(workers=1)
  File "src/fuse_api.pxi", line 333, in llfuse.main
  File "src/handlers.pxi", line 322, in llfuse.fuse_read
  File "src/handlers.pxi", line 323, in llfuse.fuse_read
  File "/usr/lib/python3/dist-packages/borg/fuse.py", line 685, in read
    data = self.key.decrypt(id, self.repository_uncached.get(id))
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 942, in get
    for resp in self.get_many([id]):
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 946, in get_many
    yield from self.call_many('get', [{'id': id} for id in ids], is_preloaded=is_preloaded)
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 794, in call_many
    handle_error(unpacked)
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 752, in handle_error
    raise IntegrityError(args[0].decode())
borg.helpers.errors.IntegrityError: Data integrity error: Invalid segment entry header [segment 695, offset 469429576]: unpack requires a buffer of 41 bytes

Platform: Linux omvnovatech 6.1.0-0.deb11.7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-2~bpo11+1 (2023-04-23) x86_64
Linux: Unknown Linux
Borg: 1.2.3  Python: CPython 3.9.2 msgpack: 1.0.0 fuse: llfuse 1.3.8 [pyfuse3,llfuse]
PID: 100739  CWD: /root
sys.argv: ['/usr/bin/borg', 'mount', '-f', '-o', 'versions', 'ssh://root@100.121.165.103/srv/mergerfs/Data/homedir/Malcolm/Borgbackup/Backup-Bo-OMV_NOVA', '/srv/mergerfs/Data/Pool1-Backup']
SSH_ORIGINAL_COMMAND: None

2nd fail

Data integrity error: Invalid segment entry header [segment 695, offset 469429576]: unpack requires a buffer of 41 bytes
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5168, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5099, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1361, in do_mount
    return self._do_mount(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 183, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1371, in _do_mount
    operations.mount(args.mountpoint, args.options, args.foreground)
  File "/usr/lib/python3/dist-packages/borg/fuse.py", line 565, in mount
    signal = fuse_main()
  File "/usr/lib/python3/dist-packages/borg/fuse.py", line 55, in fuse_main
    return llfuse.main(workers=1)
  File "src/fuse_api.pxi", line 333, in llfuse.main
  File "src/handlers.pxi", line 322, in llfuse.fuse_read
  File "src/handlers.pxi", line 323, in llfuse.fuse_read
  File "/usr/lib/python3/dist-packages/borg/fuse.py", line 685, in read
    data = self.key.decrypt(id, self.repository_uncached.get(id))
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 942, in get
    for resp in self.get_many([id]):
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 946, in get_many
    yield from self.call_many('get', [{'id': id} for id in ids], is_preloaded=is_preloaded)
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 794, in call_many
    handle_error(unpacked)
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 752, in handle_error
    raise IntegrityError(args[0].decode())
borg.helpers.errors.IntegrityError: Data integrity error: Invalid segment entry header [segment 695, offset 469429576]: unpack requires a buffer of 41 bytes

Platform: Linux omvnovatech 6.1.0-0.deb11.7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-2~bpo11+1 (2023-04-23) x86_64
Linux: Unknown Linux
Borg: 1.2.3  Python: CPython 3.9.2 msgpack: 1.0.0 fuse: llfuse 1.3.8 [pyfuse3,llfuse]
PID: 101763  CWD: /root
sys.argv: ['/usr/bin/borg', 'mount', '-f', '-o', 'versions', 'ssh://root@100.121.165.103/srv/mergerfs/Data/homedir/Malcolm/Borgbackup/Backup-Bo-OMV_NOVA', '/srv/mergerfs/Data/Pool1-Backup']
SSH_ORIGINAL_COMMAND: None

let me know if i miss anything and i will add to FIRST POST

i have tried with 2 rsync jobs running... and it fails fast with 1 job running it seems to last longer.... but will report back if the now running job completes or not

boelle commented 1 year ago

seems so far that either 1 job at the time... or the speed has something to do with it

ThomasWaldmann commented 1 year ago

That's not a borg mount problem, but your repo files are corrupted.

First check if hardware / OS works correctly, then you can try borg check --repair.

boelle commented 1 year ago

yeah.. i skipped a directory and the problems is gone

i will try and narrow it down some more

can i skip the --repair so it only checks and say what is wrong where?

and btw.. how do i prevent files from becomming corrupted again???

ThomasWaldmann commented 1 year ago

Yes, running without --repair will mostly only show what it finds. But as I said, first fix hw/OS or it might get worse.

boelle commented 1 year ago

i'm a great noob, so can you tell me how i check that?

ThomasWaldmann commented 1 year ago

Maybe in this order:

You can use memtest86+ on a USB stick and check your RAM. Run at least 1 full pass.

smartctl -a /dev/sdX and smartctl -t long /dev/sdX (and after that has finished, again with -a) could be used to check the HDD/SSD.

Linux fsck can be used to check the filesystem.

If all is fine, try borg check --repair.

boelle commented 1 year ago

Oki. I run these test automated once a month

Except memtest and fsck...

ThomasWaldmann commented 1 year ago

RAM errors can have pretty bad consequences. OTOH, the N40L supports ECC memory (and by default also came with ECC memory), so the memory system should have a decent quality. OTOH, the N40L is pretty old, so it could be also an age related malfunction of some component.

boelle commented 1 year ago

is it possible to extract files from a mounted repro but skipping corrupted ones?

now i'm looking for a simple way to extract as much i can from there that is know good and just skip the corrupt files

right now i'm doing a check --verify-data > check.txt to have a file with all the errors my plan is to clean up that file so i just have a list of files that are not "clean"

but are there a faster and better way?

ThomasWaldmann commented 1 year ago

Guess borg mount (like most borg command) expects a consistent/undamaged borg repo.

Hopefully your hardware works ok now, otherwise that borg check run might not work well (and --repair could cause more damage). In case of doubt, I guess I rather would have moved the repo to a known-good machine before working on it.

To redirect the log output, you need to redirect stderr.

After borg check --repair fixed all issues, you should be able to use borg mount. IIRC, if a file is damaged (has replacement chunks) it does not let you read the file, except if you give a special mount option.

someone-somenet-org commented 1 year ago

Could it be that you temp. lose internet connectivity during your rsync operation?

borg mount uses FUSE afaik.

FUSE throws Transport endpoint is not connected-errors at me when i work with for example sshfs and I lose my wifi connectivity and after re-establishing ssh fails to resume my sessions. Its not a borg backup issue for me.

And loss of connectivity would explain the random time between you getting the errors.

as in:

someone@localhost ~ % sshfs root@testvm:/tmp/ ~/.sshfs
someone@localhost ~ % cd ~/.sshfs
someone@localhost ~/.sshfs % ls -al
drwx------ 1 root root   60 Aug 25 01:36 <dir1>
drwx------ 1 root root   60 Aug 25 01:36 <dir2>
[...]
drwx------ 1 root root   60 Aug 25 01:36 tmux-0

root@testvm ~ # reboot
someone@localhost ~/.sshfs % ls -al
ls: cannot open directory '.': Transport endpoint is not connected

noppel commented 1 year ago

I have the same problem ("endpoint not connected") requiring me to remount the file system. I have not been able to generate debug output. The problem does not occur on a specific directory/file. It happens randomly. I do not know how to continue from here.

ThomasWaldmann commented 1 year ago

@noppel If you could make it a habit to always use borg mount -f ... in another terminal, you could maybe get a traceback some time and the issue could be found.

boelle commented 1 year ago

and maybe do a borg check to be sure backup is not corrupted

noppel commented 1 year ago

@noppel If you could make it a habit to always use borg mount -f ... in another terminal, you could maybe get a traceback some time and the issue could be found.

I have added --foreground (-f was wrong it seemed) and --debug. This is the result I got before the borg command returned:

fuse: _process_archive completed in 114.9 s for archive 2023-05-10_21:00 Calling fuse_session_destroy RepositoryCache: current items 36751, size 2.11 GB / 2.15 GB, 501699 hits, 53425 misses, 8838 slow misses (+211.5s), 25512 evictions, 0 ENOSPC hit RemoteRepository: 3.58 MB bytes sent, 3.96 GB bytes received, 67554 messages sent

ThomasWaldmann commented 1 year ago

-f and --foreground should do the same thing.

The output you got seems normal, I don't see anything unexpected there.

noppel commented 1 year ago

The output you got seems normal, I don't see anything unexpected there.

The process is exiting, tihs should not happen. It should continue running.

boelle commented 1 year ago

did you do a borg check to be sure backup is not corrupted

borgbackup / borg