Open boelle opened 1 year ago
seems so far that either 1 job at the time... or the speed has something to do with it
That's not a borg mount
problem, but your repo files are corrupted.
First check if hardware / OS works correctly, then you can try borg check --repair
.
yeah.. i skipped a directory and the problems is gone
i will try and narrow it down some more
can i skip the --repair so it only checks and say what is wrong where?
and btw.. how do i prevent files from becomming corrupted again???
Yes, running without --repair
will mostly only show what it finds. But as I said, first fix hw/OS or it might get worse.
i'm a great noob, so can you tell me how i check that?
Maybe in this order:
You can use memtest86+
on a USB stick and check your RAM. Run at least 1 full pass.
smartctl -a /dev/sdX
and smartctl -t long /dev/sdX
(and after that has finished, again with -a
) could be used to check the HDD/SSD.
Linux fsck
can be used to check the filesystem.
If all is fine, try borg check --repair
.
Oki. I run these test automated once a month
Except memtest and fsck...
RAM errors can have pretty bad consequences. OTOH, the N40L supports ECC memory (and by default also came with ECC memory), so the memory system should have a decent quality. OTOH, the N40L is pretty old, so it could be also an age related malfunction of some component.
is it possible to extract files from a mounted repro but skipping corrupted ones?
now i'm looking for a simple way to extract as much i can from there that is know good and just skip the corrupt files
right now i'm doing a check --verify-data > check.txt to have a file with all the errors my plan is to clean up that file so i just have a list of files that are not "clean"
but are there a faster and better way?
Guess borg mount
(like most borg command) expects a consistent/undamaged borg repo.
Hopefully your hardware works ok now, otherwise that borg check run might not work well (and --repair
could cause more damage). In case of doubt, I guess I rather would have moved the repo to a known-good machine before working on it.
To redirect the log output, you need to redirect stderr.
After borg check --repair
fixed all issues, you should be able to use borg mount
. IIRC, if a file is damaged (has replacement chunks) it does not let you read the file, except if you give a special mount option.
Could it be that you temp. lose internet connectivity during your rsync operation?
borg mount
uses FUSE afaik.
FUSE throws Transport endpoint is not connected
-errors at me when i work with for example sshfs and I lose my wifi connectivity and after re-establishing ssh fails to resume my sessions.
Its not a borg backup issue for me.
And loss of connectivity would explain the random time between you getting the errors.
as in:
someone@localhost ~ % sshfs root@testvm:/tmp/ ~/.sshfs
someone@localhost ~ % cd ~/.sshfs
someone@localhost ~/.sshfs % ls -al
drwx------ 1 root root 60 Aug 25 01:36 <dir1>
drwx------ 1 root root 60 Aug 25 01:36 <dir2>
[...]
drwx------ 1 root root 60 Aug 25 01:36 tmux-0
root@testvm ~ # reboot
someone@localhost ~/.sshfs % ls -al
ls: cannot open directory '.': Transport endpoint is not connected
I have the same problem ("endpoint not connected") requiring me to remount the file system. I have not been able to generate debug output. The problem does not occur on a specific directory/file. It happens randomly. I do not know how to continue from here.
@noppel If you could make it a habit to always use borg mount -f ...
in another terminal, you could maybe get a traceback some time and the issue could be found.
and maybe do a borg check to be sure backup is not corrupted
@noppel If you could make it a habit to always use
borg mount -f ...
in another terminal, you could maybe get a traceback some time and the issue could be found.
I have added --foreground (-f was wrong it seemed) and --debug. This is the result I got before the borg command returned:
fuse: _process_archive completed in 114.9 s for archive 2023-05-10_21:00 Calling fuse_session_destroy RepositoryCache: current items 36751, size 2.11 GB / 2.15 GB, 501699 hits, 53425 misses, 8838 slow misses (+211.5s), 25512 evictions, 0 ENOSPC hit RemoteRepository: 3.58 MB bytes sent, 3.96 GB bytes received, 67554 messages sent
-f
and --foreground
should do the same thing.
The output you got seems normal, I don't see anything unexpected there.
The output you got seems normal, I don't see anything unexpected there.
The process is exiting, tihs should not happen. It should continue running.
did you do a borg check to be sure backup is not corrupted
Have you checked borgbackup docs, FAQ, and open GitHub issues?
YES
Is this a BUG / ISSUE report or a QUESTION?
ISSUE, but sure i goof up somehow
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.2.3
Operating system (distribution) and version.
Openmediavault 6.4.5-1 (Shaitan)
Hardware / network configuration, and filesystems used.
HP Proliant N40l Microserver EXT4
How much data is handled by borg?
Not sure, but ~500 GB
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg mount -o versions ssh://root@100.121.165.103/srv/mergerfs/Data/homedir/Malcolm/Borgbackup/Backup-Bo-OMV_NOVA /srv/mergerfs/Data/Pool1-Backup
and
rsync -auvc --progress /srv/mergerfs/Data/Pool1-Backup/srv/mergerfs/Pool1/Public-Test/Hannah /srv/mergerfs/Data/Backup/srv/mergerfs/Pool1/Public-Test
Describe the problem you're observing.
sooner or later i get "Transport endpoint is not connected" when i use Rsync to restore the backup
i know that there is borg extract but there is no explanation on how to do a "merged" extract like i can do when mounting as "versions" and then rsync to a temp folder
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
just wait a bit and it will happen... sometimes it happens within sec's and sometime minutes, rarely an hour will pass
Include any warning/errors/backtraces from the system logs
Doing a mount with -f right now.... i will add as soon it fails again
rsync errors:
and
First fail
2nd fail
let me know if i miss anything and i will add to FIRST POST
i have tried with 2 rsync jobs running... and it fails fast with 1 job running it seems to last longer.... but will report back if the now running job completes or not