DownloadTicketService / dl

Download Ticket Service
https://www.thregr.org/~wavexx/software/dl/
GNU General Public License v2.0
84 stars 30 forks source link

support non-ascii filenames #72

Closed mjg closed 3 years ago

mjg commented 6 years ago

Currently, dl-cli bails out on filenames that cannot be encoded by the ascii codec.

Infer the encoding from the current locale and convert the path and filename using that.

Note that this assumes that when the filename is passed in as extra parameter it uses the current encoding, as well. (Currently, no caller does that, and filename is inferred from the path which comes from the argument.)

wavexx commented 6 years ago

I guess only FORM_FILENAME needs to be encoded, and here I would always use utf8 as the network encoding.

Does curl actually fails to read the source file if path is not encoded?

mjg commented 6 years ago

Full error msg on a file name containing ö:

Traceback (most recent call last):
  File "/home/mjg/bin/dl-cli.py", line 228, in <module>
    main()
  File "/home/mjg/bin/dl-cli.py", line 217, in main
    answ = newticket(args.file[0], cfg)
  File "/home/mjg/bin/dl-cli.py", line 73, in newticket
    ("msg", json.dumps({}))])
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 8: ordinal not in range(128)

The message is misleading, though: it's not about json.dumps but the line before.

FORM_FILE needs to be encoded so that curl can actually read the correct file. FORM_FILENAME needs to be encoded so that it is labelled with the correct filename in the web interface on download. (I've tried with and without to check this.)

I also tested this with different locale settings: de_DE.utf8 and de_DE.iso88591. The patch works with both. If I encode using 'utf-8' instead of getlocale()[1] then things work for de_DE.utf8 only, but not for de_DE.iso88591 (curl cannot open the file). Yes, encode encodes from string to byte string.

Encoding/decoding can be confusing, but this is the right direction, see http://pycurl.io/docs/latest/unicode.html :)

wavexx commented 6 years ago

On Wed, Apr 04 2018, Michael J. Gruber wrote:

FORM_FILE needs to be encoded so that curl can actually read the correct file. FORM_FILENAME needs to be encoded so that it is labelled with the correct filename in the web interface on download. (I've tried with and without to check this.)

Just to be clear, is this python2 or python3?

Encoding/decoding can be confusing, but this is the right direction, see http://pycurl.io/docs/latest/unicode.html :)

The most annoying thing here is how paths are handled by the various modules.

mjg commented 6 years ago

This is py3, which is requested explicitely by the shebang line.

py2 would probably need to set the locale with locale.setlocale(locale.LC_ALL, '') first (set locale from the environment)... and then fail. Maybe py2 wouldn't even need the recoding. Really, this can be pretty confusing because py modules tend to expect different types of parameters in their py2 and py3 versions, and unicode_literals typically makes things worse (because it changes the type of default strings).