aajanki / yle-dl

Download videos from Yle servers
https://aajanki.github.io/yle-dl/index-en.html
GNU General Public License v3.0
308 stars 51 forks source link

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' #62

Closed tigert closed 9 years ago

tigert commented 9 years ago

Hei. Suurkiitos hienosta työkalusta :-)

Then a bug report: Looks like accented characters in Areena filenames are an issue. For example "Ryhmä Hau" http://areena.yle.fi/1-2213828 (the 'ä') gives an error when trying to save the file.

osmc@osmc:/media/Tikku/Areena/Ryhma Hau$ yle-dl --protocol hds:youtubedl http://areena.yle.fi/1-2213828
yle-dl 2.7.0: Download media files from Yle Areena and El\xe4v\xe4 Arkisto
Copyright (C) 2009-2015 Antti Ajanki <antti.ajanki@iki.fi>, license: GPLv2
Subtitles saved to Ryhm\xe4 Hau: Lunta ja j\xe4\xe4t\xe4-2015-04-30T07:55:59+03:00.fin.srt
Output file: Ryhm\xe4 Hau: Lunta ja j\xe4\xe4t\xe4-2015-04-30T07:55:59+03:00.flv
[download] Downloading f4m manifest
[download] Destination: Ryhm Hau: Lunta ja jt-2015-04-30T07:55:59+03:00.flv
[download]   0.4% of 325.97MiB at     ---b/s ETA 00:00Traceback (most recent call last):
  File "/home/osmc/bin/yle-dl", line 2010, in <module>
    main()
  File "/home/osmc/bin/yle-dl", line 2006, in main
    sys.exit(dl.download_episodes(url, sfilt, rtmpdumpargs, destdir))
  File "/home/osmc/bin/yle-dl", line 1603, in download_episodes
    return self._retry_call('download_episodes', *args, **kwargs)
  File "/home/osmc/bin/yle-dl", line 1590, in _retry_call
    res = method(*args, **kwargs)
  File "/home/osmc/bin/yle-dl", line 1037, in download_episodes
    return self.process(download_clip, url, filters)
  File "/home/osmc/bin/yle-dl", line 1070, in process
    res = self.process_single_episode(clipfunc, clipurl, filters)
  File "/home/osmc/bin/yle-dl", line 1106, in process_single_episode
    return clipfunc(clip)
  File "/home/osmc/bin/yle-dl", line 1035, in download_clip
    return downloader.save_stream()
  File "/home/osmc/bin/yle-dl", line 1862, in save_stream
    if not f4mdl.download(outputfile, info):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/downloader/common.py", line 291, in download
    return self.real_download(filename, info_dict)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/downloader/f4m.py", line 297, in real_download
    with open(frag_filename, 'rb') as down:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 4: ordinal not in range(128)
osmc@osmc:/media/Tikku/Areena/Ryhma Hau$ 

This is OSMC media center distribution that is the Kodi mediacenter on top of Debian Lenny (Raspbuan) on a RaspberryPi. Filenames with non-accented characters work ok. Filesystem is EXT4.

aajanki commented 9 years ago

I can replicate this if I use an ascii locale. I'm not sure if the bug is in yle-dl or in youtube-dl.

As a potential workaround try a UTF-8 locale:

export LANG=fi_FI.UTF-8
export LC_CTYPE=fi_FI.UTF8
yle-dl --protocol hds:youtubedl http://areena.yle.fi/1-2213828

If fi_FI locale doesn't exist try en_US.ETF-8 or C.UTF-8, or generate a UTF-8 locale if your system doesn't have one.

aajanki commented 9 years ago

After some analysis, I found out that this is really a youtube-dl bug. I made a pull request for youtube-dl (https://github.com/rg3/youtube-dl/pull/5588) so this should get fixed eventually.