beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.78k stars 1.82k forks source link

web: `UnicodeEncodeError` on non-latin1 characters in filename #2815

Open UniIsland opened 6 years ago

UniIsland commented 6 years ago

Problem

I'm using beet with tomahawk. If I try to play a song with CJK character in its name, the web server throws UnicodeEncodeError.

127.0.0.1 - - [22/Feb/2018 15:28:19] "GET /item/10933/file HTTP/1.0" 200 -
Error on request:
Traceback (most recent call last):
  File "/usr/local/Cellar/pyenv/1.2.1/versions/3.6.4/lib/python3.6/site-packages/werkzeug/serving.py", line 270, in run_wsgi
    execute(self.server.app)
  File "/usr/local/Cellar/pyenv/1.2.1/versions/3.6.4/lib/python3.6/site-packages/werkzeug/serving.py", line 261, in execute
    write(data)
  File "/usr/local/Cellar/pyenv/1.2.1/versions/3.6.4/lib/python3.6/site-packages/werkzeug/serving.py", line 227, in write
    self.send_header(key, value)
  File "/usr/local/Cellar/pyenv/1.2.1/versions/3.6.4/lib/python3.6/http/server.py", line 508, in send_header
    ("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 46-50: ordinal not in range(256)

Here's a song that can cause this problem (fields dumped with the export plugin):

  {
        "acoustid_fingerprint": null,
        "acoustid_id": null,
        "album": "匆匆",
        "albumartist": "胡德夫",
        "albumartist_credit": "胡德夫",
        "albumartist_sort": "Ara Kimbo",
        "albumdisambig": null,
        "albumstatus": "Official",
        "albumtype": "album",
        "arranger": "",
        "art": false,
        "artist": "胡德夫",
        "artist_credit": "胡德夫",
        "artist_sort": "Ara Kimbo",
        "asin": null,
        "bitdepth": 0,
        "bitrate": 128000,
        "bpm": 0,
        "catalognum": "WFM05001",
        "channels": 2,
        "comments": null,
        "comp": false,
        "composer": null,
        "composer_sort": null,
        "country": "TW",
        "date": "2005-04-01",
        "day": null,
        "disc": 1,
        "disctitle": null,
        "disctotal": 1,
        "encoder": null,
        "format": "MP3",
        "genre": "Folk",
        "genres": [
            "Folk"
        ],
        "grouping": null,
        "initial_key": null,
        "label": "野火樂集",
        "language": "zho",
        "length": 316.6040625,
        "lyricist": null,
        "lyrics": "",
        "mb_albumartistid": "46dfef42-826d-4cb1-8d28-940d30aa3bf9",
        "mb_albumid": "de95a0cb-87c0-4d64-b753-f5c98bde3271",
        "mb_artistid": "46dfef42-826d-4cb1-8d28-940d30aa3bf9",
        "mb_releasegroupid": "0fe19e52-b54b-4a6a-946d-a23b58766e7c",
        "mb_trackid": "b69fce9c-373b-46a3-b060-4ffdc4800430",
        "media": "CD",
        "month": 4,
        "original_date": "2005-04-01",
        "original_day": null,
        "original_month": 4,
        "original_year": 2005,
        "r128_album_gain": 0,
        "r128_track_gain": 0,
        "rg_album_gain": -4.44,
        "rg_album_peak": 1.088344,
        "rg_track_gain": -5.09,
        "rg_track_peak": 1.032934,
        "samplerate": 44100,
        "script": "Hant",
        "title": "太平洋的風",
        "track": 1,
        "tracktotal": 12,
        "year": 2005
    }

Setup

sampsyo commented 6 years ago

Hello! Thanks for the details. Because this error happens when sending the headers, I suspect that the problem only occurs because there are non-Latin1 characters in the filename (not just in the metadata). Can you confirm that the filename has CJK characters?

waweic commented 6 years ago

This seems like a Python 3 specific issue. It works fine for me with Python 2.7 and Chromium. In Python 3, it even occurs on characters like single right quotation marks (u2019), that can be found in filenames pretty often. This could possibly be prevented by "de-asciifying" the attachment_filename or taking a fallback filename. Is that an option?

sampsyo commented 6 years ago

Thanks! Yeah, it seems like the right thing to do is to ASCIIfy the filename. (For clues about how to do this, see the uses of unidecode elsewhere in the codebase.)