cenkalti / putio.py

A python wrapper for put.io APIv2
http://put.io
MIT License
72 stars 41 forks source link

Download encoding issue #45

Closed woutwoot closed 5 years ago

woutwoot commented 5 years ago

While downloading a directory, I got this encoding issue. The issue seems to be this char in one of the file's names: ’ Can this be worked around? It might need to be filtered out for it to be saved to disk.

Traceback (most recent call last):
  ...
  File "/usr/local/lib/python3.6/dist-packages/putiopy.py", line 344, in download
    self._download_directory(dest, delete_after_download, chunk_size)
  File "/usr/local/lib/python3.6/dist-packages/putiopy.py", line 356, in _download_directory
    sub_file.download(dest, delete_after_download, chunk_size)
  File "/usr/local/lib/python3.6/dist-packages/putiopy.py", line 346, in download
    self._download_file(dest, delete_after_download, chunk_size)
  File "/usr/local/lib/python3.6/dist-packages/putiopy.py", line 391, in _download_file
    if os.path.exists(filepath):
  File "/usr/lib/python3.6/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 83: ordinal not in range(128)
cenkalti commented 5 years ago

Hello @woutwoot,

I couldn't reproduce the issue. On my test, a file with a ` character has been downloaded successfully and didn't raise that exception. I did this test with Python 3.7. I see that you are running 3.6, however I don't think that it's the source of the problem.

I need more information to debug the issue.

What exact Python version are you running?

What system are you running on (Linux, Windows, MacOS)?

What parameters are you passing to download method?

woutwoot commented 5 years ago

Hi @cenkalti,

I'm using Python 3.6.3 on Linux (Ubuntu 17.10 container running on Proxmox) I'm sure you did so, but be sure to test the exact char I posted as the one you just mentioned is a different one 🙂

I'm using this simple code:

file = putio.File.get(file_id)
file.download(target_dir)
cenkalti commented 5 years ago

Is target_dir bytes or str?

woutwoot commented 5 years ago

Essentially, it is constructed this way: target_dir = os.path.abspath("/path/to/dir") I believe that means it should be a string? In any case, other downloads seem to be working fine. EDIT: It does come from a string in a JSON config file that is being loaded this way: config = json.load(open("config.json", mode='r', encoding='utf-8'))

cenkalti commented 5 years ago

Sorry, still can't reproduce.

The character shouldn't matter as long as it is not ASCII ( > 128). Maybe there is a problem with the file. Can you tell me file_id please?

woutwoot commented 5 years ago

The file ID is: 597476284, but I'm not sure that transfers over to other accounts?

cenkalti commented 5 years ago

Hello @woutwoot

I guess your filesystem encoding is not UTF-8. What does the following returns?

import sys; sys.getfilesystemencoding()

It must return utf-8. If it returns ascii you can fix your filesystem encoding by setting correct locale settings before starting your Python script.

ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
ENV LC_CTYPE=en_US.UTF-8
ENV LC_MESSAGES=en_US.UTF-8
woutwoot commented 5 years ago

Thanks a lot for helping me figure this out. It did indeed return ascii. Although the env vars seemed to be set correctly, only after re-running these python reported utf-8:

dpkg-reconfigure locales (select and set as default en_US.UTF-8)
locale-gen

This is probably related to me using an LXC container. (they tend to have locale issues after installation)

woutwoot commented 5 years ago

FYI, I now know the exact reason that this was not working. The shell session running my script was started before fixing the locale settings on the container. This caused the locales to be wrong for the script, while they seemed to be fine when I checked in my session. Again, thanks for the help!

cenkalti commented 5 years ago

No problem. I'm glad that it is resolved.