borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11k stars 739 forks source link

'borg create' stops when encountering invalid characters in filenames #8183

Closed jensb closed 5 months ago

jensb commented 5 months ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG/ISSUE

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.2.7

Operating system (distribution) and version.

Ubuntu 22.04 LTS

Hardware / network configuration, and filesystems used.

ext4, borg used via SSH on another Linux-like system (OpenWRT) as server using borg 1.2.4

How much data is handled by borg?

about 500 GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create -v --filter ACE --list --show-rc --compression auto,lzma -c 300 --stats --exclude-from=$EXFILE $ROOTDIR

Describe the problem you're observing.

When borg encounters a file that contains an invalid UTF8 character, like this: ..../kokosnuÃ.txt which is supposed to be ..../kokosnuß.txt but was incorrectly encoded, it simply stops console output at this character. Not even the .txt is printed any more. In the background it seems it continues uploading data, but the output of added files is simply halted. Ctrl-C seems to work but borg does not produce any console output any more. I would expect borg to either backup the file as is, with the invalid character, or skip the file, warn about it and continue, or skip the file and bail out with an error message, but not simply halt output and freeze.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

100% reproducible. Also with other (incorrectly encoded) filenames.

infectormp commented 5 months ago

Please post locale output here

jensb commented 5 months ago
$ locale
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=
jensb commented 5 months ago

I think I need to close this, or rather, move it somewhere else.

The output stops in the Konsole (the KDE terminal emulator), it doesn't stop when using 'foot' (a Wayland terminal emulator) or running this in a pure text terminal. Also, running convmv to fix the filename encoding will stop output in Konsole exactly when encountering a badly encoded ß (as in "Geheime VerschluÃsache") when run in Konsole, but not when run in foot or a plain text terminal.

Sorry to have bothered you. :-|