asweigart / pyperclip

Python module for cross-platform clipboard functions.
https://pypi.python.org/pypi/pyperclip
BSD 3-Clause "New" or "Revised" License
1.6k stars 193 forks source link

WSL2 broken encoding with pyperclip? #244

Closed timbr0wn closed 2 weeks ago

timbr0wn commented 1 year ago

Within the last couple weeks, pyperclip stopped working for me in WSL2 on the default "Ubuntu" distro (currently 22.04.2 LTS).

Pyperclip 1.8.2 was working and I didn't change anything as far as I know. Just now, I tried both 1.8.1 and 1.8.2 and got the broken results shown below, despite the fact that Get-Clipboard still doesn't seem to have encoding problems within Powershell itself.

I can no longer pyperclip.paste() if there are non-ASCII characters on the clipboard. For example, consider the string made £250

File "/home/tim/.local/lib/python3.10/site-packages/pyperclip/__init__.py", line 517, in paste_wsl
    return stdout[:-2].decode(ENCODING)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 5: invalid start byte

If I run the code from pyperclip without .decode() at the end, we can see that the raw bytes it's trying to decode don't seem correct:

print(subprocess.Popen(['powershell.exe', '-noprofile', '-command', 'Get-Clipboard'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True).communicate()[0][:-2])
b'made \x9c250'

Also, if I pyperclip.copy('made £250') then I get the text made ┬ú250 in my clipboard. Any ideas? Thanks!

timbr0wn commented 1 year ago

I'm not sure what's going on, but I guess clip.exe is using cp437, because pyperclip.ENCODING = 'cp-437' decodes properly.

However, the rest of my script assumes UTF-8, and my WSL2 terminal is using UTF-8 in general. I can't use cp-437 for everything anyway, since I need to handle a variety Unicode characters and that would just give endless encoding headaches.

Something has definitely changed, because everything was working perfectly up until a few weeks ago. I will keep investigating...

timbr0wn commented 1 year ago

It seems like enabling "Beta: Use Unicode UTF-8 for worldwide language support" fixes my issue: https://superuser.com/a/1451686

Btw I'm on Windows 11 Pro. I'm really hoping that this option doesn't have any unintentional side effects 😬

My best guess for what I observed was that pyperclip was receiving text encoded with the OEM code page (as used in console applications; CP437 for en-us) instead of the ANSI code page (as used in GUI-subsystem applications; e.g., 1252 (Windows-1252).

intuited commented 1 month ago

Also had this error. Was able to work around it by catching the UnicodeEncodingException and setting pyperclip.ENCODING = 'cp437' (no hyphen) before calling pyperclip.paste() again in the handler block. Thanks @timbr0wn! Seems to be working well enough with that change made.

asweigart commented 2 weeks ago

Fixed by PR 257