UnicodeDecodeError: 'utf-8' codec can't decode byte (Windows CHCP check)

InfoSCE commented 3 years ago

Hi,

I recently discover your third-party for rclone and i really enjoy it. I just want to share with you my experience when i tried to run for the first time rclonesync on my server. I read your documentation and configured properly my rclone's remote in version 1.55 and installed Python 3.9.6. I've seen i needed in Windows environnement to set "chcp to 65001" and "set PYTHONIONECODING=UTF-8" before i can run my first sync. I did it. But, when i launched my command

rclonesync -1 C:\xxx\ remote:folder

I got this error :

File "E:\RClone\v1.55_x64\rclonesync.py", line 801, in <module>
chcp = subprocess.check_output(["chcp"], shell=True).decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 20: invalid start byte

I've tried to understand why, so i checked my chcp and PYTHONIOENCODING values, all good. I opened rclonesync file with Python IDLE and i gone to line 801 :

chcp = subprocess.check_output(["chcp"], shell=True).decode("utf-8")

I've tried to run that line in python's console, same error. So, I ran the first part before "decode("utf-8)" and i got that :

b'Page de codes active\xff: 65001\r\n'

You can see the xff part which seems to generate an error. To avoid this i added an ignore argument in the command line and it works perfectly.

chcp = subprocess.check_output(["chcp"], shell=True).decode("utf-8","ignore")

But the error is still there, so if you have solution for me, i take it with pleasure :)

Thank you... and sorry for my english ;)

Damien

cjnaz commented 3 years ago

Thanks for the bug report, @InfoSCE.

Since my system uses English, I seldom run into utf-8 encoding, and it is difficult for me to do any debug and verification. I don't understand how to control the Windows command response so that it only returns utf-8 encoding versus codes such as 0xff. Your solution looks good for this specific error.

Note that there may be cases where you have file names that contain non-utf-8 characters and rclonesync may error out on these as well.

I will leave this issue open for others to find, if needed. Note that there is a beta version of rclone that incorporates rclonesync directly as bisync. This new rclone will replace rclonesync hopefully later this year. As such, I'm doing only critical bug fixes on rclonesync.

cjnaz commented 3 years ago

This error (fixed with the .decode("utf-8","ignore")) would come up in all runs, not just the first sync. I've updated the issue title.

cjnaz / rclonesync-V2

UnicodeDecodeError: 'utf-8' codec can't decode byte (Windows CHCP check) #74