cjnaz / rclonesync-V2

A Bidirectional Cloud Sync Utility using rclone
MIT License
356 stars 39 forks source link

Sync failes due to UTF-8 decode errors on Linux #51

Closed coletonodonnell closed 4 years ago

coletonodonnell commented 4 years ago

OS:

[root@pasta ~]# uname -r
5.6.15-arch1-1

Error:

2020-06-03 13:35:34,269:  Exception in load_list loading </root/.rclonesyncwd/LSL__mnt_hhd_Files_drive2_Computer__Path1>:  <'utf8' codec can't decode byte 0x99 in position 2421: invalid start byte>
2020-06-03 13:35:34,271:    ERROR    Failed loading Path1 list file </root/.rclonesyncwd/LSL__mnt_hhd_Files_drive2_Computer__Path1>

Full Output:

/home/admin/.rclonesyncwd/rclonesync.py /mnt/hhd/Files drive2:Computer --first-sync --verbose --rclone-args -L --drive-skip-gdocs
2020-06-03 13:34:59,296:  ***** BiDirectional Sync for Cloud Services using rclone (V2.10 200411) *****
2020-06-03 13:34:59,370:  Lock file created: </tmp/rclonesync_LOCK__mnt_hhd_Files_drive2_Computer_>
2020-06-03 13:34:59,370:  Synching Path1  </mnt/hhd/Files/>  with Path2  <drive2:Computer/>
2020-06-03 13:34:59,370:  Command args: <Path1=/mnt/hhd/Files, Path2=drive2:Computer, check_access=False, check_filename=RCLONE_TEST, config=None, dry_run=False, filters_file=None, first_sync=True, force=False, max_deletes=50, no_datetime_log=False, rc_verbose=None, rclone=rclone, rclone_args==[-L --drive-skip-gdocs], remove_empty_directories=False, verbose=1, workdir=/root/.rclonesyncwd>
2020-06-03 13:34:59,370:  >>>>> --first-sync copying any unique Path2 files to Path1
2020/06/03 13:34:59 NOTICE: Local file system at /mnt/hhd/Files/: Replacing invalid UTF-8 characters in "Documents/Programming/Learning/Python/Learn Python the Hard Way/Course/Book/\x99\x91\xdb.\x87␄␘=V~␕\xa2\x90H~\xc1␂^\xb6b\xe6C␙L␞Ƌ\xfc␔h␞␑x\x8c␟\xcd\xe6=\xe2\xc5=\xee\xe5\xd8N␆\x9d\x93"
2020/06/03 13:34:59 NOTICE: Local file system at /mnt/hhd/Files/: Replacing invalid UTF-8 characters in "Documents/Programming/Learning/Python/Learn Python the Hard Way/Course/Book/\xe6\xc8\xe8#\x8d␒\xbf\xdc\xed\xe3Ŗ␙\xb9%Ft\x84␔Sk\xd4V\xb8\xfc\xdfG䓑\xd6\xc0\xf4.Ƀ\x91\xe9\xf9x%\xf7\xd7r\x9977[\xf6D␡#\x9b"
2020/06/03 13:34:59 NOTICE: Local file system at /mnt/hhd/Files/: Replacing invalid UTF-8 characters in "Documents/Programming/Learning/Python/Learn Python the Hard Way/Course/Book/\xed\x87,\xb6\x98Yz\xba\x83\xc0L\xe6F\xe0\xd3"
2020-06-03 13:35:34,269:  Exception in load_list loading </root/.rclonesyncwd/LSL__mnt_hhd_Files_drive2_Computer__Path1>:  <'utf8' codec can't decode byte 0x99 in position 2421: invalid start byte>
2020-06-03 13:35:34,271:    ERROR    Failed loading Path1 list file </root/.rclonesyncwd/LSL__mnt_hhd_Files_drive2_Computer__Path1> - 
2020-06-03 13:35:34,272:  Lock file removed: </tmp/rclonesync_LOCK__mnt_hhd_Files_drive2_Computer_>
2020-06-03 13:35:34,272:  ***** Critical Error Abort - Must run --first-sync to recover.  See README.md *****

Locale Settings:

locale -a

[root@pasta ~]# locale -a
C
en_US
en_US.iso88591
en_US.utf8
POSIX

locale

[root@pasta ~]# locale
LANG=en_US.ISO-8859-1
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

localectl list-locales

[root@pasta ~]# localectl list-locales
en_US.UTF-8

Notes:

I have chmod +x rclonesync.py and my rclone is setup properly.

cjnaz commented 4 years ago

Hi. Interesting.

This log line is from rclone, perhaps based on how rclonesync invokes rclone. It looks like the encoding is ISO-8859-1, aka Latin-1. rclonesync is currently hard-coded to expect utf-8 encoding, fyi.

2020/06/03 13:34:59 NOTICE: Local file system at /mnt/hhd/Files/: Replacing invalid UTF-8 characters in "Documents/Programming/Learning/Python/Learn Python the Hard Way/Course/Book/\x99\x91\xdb.\x87␄␘=V~␕\xa2\x90H~\xc1␂^\xb6b\xe6C␙L␞Ƌ\xfc␔h␞␑x\x8c␟\xcd\xe6=\xe2\xc5=\xee\xe5\xd8N␆\x9d\x93"

To see if its rclonesync specific or related to your environment, please try running this from the command line and check the output for the above file: rclone lsl /mnt/hhd/Files as the same user (root?) and LANG locale setting. Also, please try to set your locale to LANG="en_US.UTF-8", if possible.

coletonodonnell commented 4 years ago

Sorry for taking some time to respond, got a little busy:

Output to sudo rclone lsl /mnt/hhd/Files was a lot of files, and they were accurate at that. As for the lang, according to the arch wiki, you shouldn't include quotes when setting your LANG variable, but I decided to try it just in case. It was the same error though.

cjnaz commented 4 years ago

In /root/.rclonesyncwd you should have an LSL__mnt_hhd... __Path1 file. Check with a hex dump viewer if the file name characters for the suspect files are utf8 or latin1 encoding. I suspect they are latin1 encoded, which is why load_list is choking.

Also run the same with -v -v --rc-verbose --rc-verbose to get the debug level logging from both rclonesync and rclone, respectively, which might have some clues. I'm not sure where/when these error lines are coming up

2020/06/03 13:34:59 NOTICE: Local file system at /mnt/hhd/Files/: Replacing invalid UTF-8 characters in "Documents/Programming/Learning/Python/Learn Python the Hard Way/Course/Book/\x99\x91\xdb.\x87␄␘=V~␕\xa2\x90H~\xc1␂^\xb6b\xe6C␙L␞Ƌ\xfc␔h␞␑x\x8c␟\xcd\xe6=\xe2\xc5=\xee\xe5\xd8N␆\x9d\x93"

Try dropping this file onto your local drive to see if it is represented correctly, and rclonesyncs correctly (create a small test tree for debugging)... Русский.txt

Please post a text file with the problematic file name. If we're lucky I can download it and try to find a solution locally.

If you move those 3 files out does rclonesync work correctly?

My reading on locale handling suggests this problem is messy.

coletonodonnell commented 4 years ago

On further inspection, a bunch of glitched files were there that shouldn't have been. I deleted them. It works now! I don't know what the real underlying issue was, but those files weren't supposed to be there in the first place.

cjnaz commented 4 years ago

Please try dropping the above .txt file onto your sync tree to check that it's properly handled, considering that your LANG isn't set to utf8. This will help confirm that the locale LANG setting is or isn't a factor for this tool.
Also, if you still have one of those problematic filenames it would be quite helpful if you would post a .txt file with just a few characters in it and named with the problem name. Thx.

coletonodonnell commented 4 years ago

Please try dropping the above .txt file onto your sync tree to check that it's properly handled, considering that your LANG isn't set to utf8. This will help confirm that the locale LANG setting is or isn't a factor for this tool.

Sure thing, I dropped that into my Documents directory, and it backed it up successfully! It was stored as "default.txt" and had "Russian" as it's contents. I don't know if that was supposed to happen though.

Also, if you still have one of those problematic filenames it would be quite helpful if you would post a .txt file with just a few characters in it and named with the problem name.

I deleted them earlier, I honestly think it was my own fault those files were like that. I apologise, if I encounter the issue ever again though I will reopen this with the txt files.

cjnaz commented 4 years ago

It looks like we lost the Cyrillic script name for the file. Try to rename default.txt to "Русский.txt". Hopefully this name survives through github. For me the filename looks like:

image

coletonodonnell commented 4 years ago

It looks like we lost the Cyrillic script name for the file. Try to rename default.txt to "Русский.txt". Hopefully this name survives through github. For me the filename looks like:

Alright, so I made a file named "Русский.txt" and inside of it I put "Русский.txt" as well, just so there was something in there. From here, I ran the script and it saved properly as "Русский.txt" in the drive.

cjnaz commented 4 years ago

For giggles, try some renames and deletes on each side to exercise more of the logic. For now we call this a sighting and leave it closed. Thanks for your debug assistance.

coletonodonnell commented 4 years ago

It seemed to have worked when I renamed it, and when I deleted it. Though, I did accidentally delete on my computer twice, and it didn't sync the second time, but I don't really know why.