beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.7k stars 1.81k forks source link

convert plugin crashed when filename has some unicode character (♥) UnicodeEncodeError #3526

Open mudssky opened 4 years ago

mudssky commented 4 years ago

Problem

This is the filename beet ls title:マジヤバもーそうLOVE -f '$path'

D:\MusicLibrary\AnimeSongs\マジヤバもーそうLOVE♥ (オリジナル・バーシオン) - 合田彩, 柏山奈々美, 積田かよ子, 村井理沙子, 福原由莉奈, 小松真奈, 村上まどか & 月宮みど.flac

when use query to specific this file to convert ,cause error: UnicodeEncodeError: 'gbk' codec can't encode character '\u2665' in position 39: illegal multibyte sequence My system language is Chinese, it seems like the convert plugin encode the command string to gbk

enter the unicode string in ipython i figured out '\u2665' = but the unicode character can't be find in gbk ,

i think it's no need to encode gbk on my system,so i edit the py file to return utf-8 directly, then it work properly C:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beets\util\__init__.py line 321

def arg_encoding():
    """Get the encoding for command-line arguments (and other OS
    locale-sensitive strings).
    """
    try:
         #======================== return utf-8 directly
        # return locale.getdefaultlocale()[1] or 'utf-8'
        return 'utf-8'
    except ValueError:
        # Invalid locale environment variable setting. To avoid
        # failing entirely for no good reason, assume UTF-8.
        return 'utf-8'

convert succeed:

C:\Users\caichengtao05>beet convert -d E:\MusicLibrary    title:マジヤバもーそうLOVE
合田彩, 柏山奈々美, 積田かよ子, 村井理沙子, 福原由莉奈, 小松真奈, 村上まどか & 月宮みどり - マジヤバもーそうLOVE - マジヤバもーそうLOVE♥ (オリジナル・バーシオン)
Convert? (Y/n) y
convert: Encoding D:\MusicLibrary\AnimeSongs\マジヤバもーそうLOVE♥ (オリジナル・バーシオン) - 合田彩, 柏山奈々美, 積田
かよ子, 村井理沙子, 福原由莉奈, 小松真奈, 村上まどか & 月宮みど.flac
convert: Finished encoding D:\MusicLibrary\AnimeSongs\マジヤバもーそうLOVE♥ (オリジナル・バーシオン) - 合田彩, 柏山奈々美, 積田かよ子, 村井理沙子, 福原由莉奈, 小松真奈, 村上まどか & 月宮みど.flac

Running this command in verbose (-vv) mode:

$ beet -vv  convert -d E:\MusicLibrary  -p  title:マジヤバもーそうLOVE

Led to this problem:

user configuration: C:\Users\caichengtao05\AppData\Roaming\beets\config.yaml
data directory: C:\Users\caichengtao05\AppData\Roaming\beets
plugin paths:
Sending event: pluginload
lyrics: Disabling google source: no API key configured.
inline: adding item field has_date
library database: D:\MusicLibrary\musiclibrary.db
library directory: D:\MusicLibrary
Sending event: library_opened
Traceback (most recent call last):
  File "C:\Users\caichengtao05\AppData\Local\Programs\Python\Python37-32\Scripts\beet-script.py", line 11, in <module>
    load_entry_point('beets==1.4.9', 'console_scripts', 'beet')()
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beets\ui\__init__.py", line 1266, in main
    _raw_main(args)
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beets\ui\__init__.py", line 1253, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beetsplug\convert.py", line 458, in convert_func
    pipe.run_parallel()
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beets\util\pipeline.py", line 445, in run_parallel
    six.reraise(exc_info[0], exc_info[1], exc_info[2])
  File "C:\Users\caichengtao05\AppData\Roaming\Python\Python37\site-packages\six.py", line 693, in reraise
    raise value
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beets\util\pipeline.py", line 358, in run
    self.coro.send(msg)
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beetsplug\convert.py", line 303, in convert_item
    self.encode(command, original, converted, pretend)
  File "c:\users\caichengtao05\appdata\local\programs\python\python37-32\lib\site-packages\beetsplug\convert.py", line 223, in encode
    encode_cmd.append(args[i].encode(util.arg_encoding()))
UnicodeEncodeError: 'gbk' codec can't encode character '\u2665' in position 39: illegal multibyte sequence

Here's a link to the music files that trigger the bug (if relevant):

Setup

lyrics:
    bing_lang_from: []
    auto: no
    bing_client_secret: REDACTED
    bing_lang_to:
    google_API_key: REDACTED
    google_engine_ID: REDACTED
    genius_api_key: REDACTED
    fallback:
    force: no
    local: no
    sources:
    - google
    - lyricwiki
    - musixmatch
    - genius
library: D:/MusicLibrary/musiclibrary.db
directory: D:/MusicLibrary

import:
    write: yes
    copy: yes
    move: no
    link: no
    hardlink: no
    delete: no
    resume: ask
    incremental: no
    incremental_skip_later: no
    from_scratch: no
    quiet_fallback: skip
    none_rec_action: ask
    timid: no
    log: musiclibrary.log
    autotag: no
    quiet: no
    singletons: yes
    default_action: apply
    languages: []
    detail: yes
    flat: no
    group_albums: no
    pretend: no
    search_ids: []
    duplicate_action: ask
    bell: no
    set_fields: {}

clutter: [Thumbs.DB, .DS_Store]
ignore:
- .*
- '*~'
- System Volume Information
- lost+found
ignore_hidden: yes

replace:
    '[\\/]': _
    ^\.: _
    '[\x00-\x1f]': _
    '[<>:"\?\*\|]': _
    \.$: _
    \s+$: ''
    ^\s+: ''
    ^-: _
path_sep_replace: _
asciify_paths: no
art_filename: cover
max_filename_length: 0

aunique:
    keys: albumartist album
    disambiguators: albumtype year label catalognum albumdisambig releasegroupdisambig
    bracket: '[]'

overwrite_null:
    album: []
    track: []

plugins:
- chroma
- convert
- fetchart
- fromfilename
- inline
- lastgenre
- lyrics
- web
pluginpath: []
threaded: yes
timeout: 5.0
per_disc_numbering: no
verbose: 0
terminal_encoding:
original_date: no
artist_credit: no
id3v23: no
va_name: Various Artists
chroma:
    auto: yes
acoustid:
    apikey: REDACTED
convert:
    dest: none
    format: aac
    formats:
        aac:
            command: qaac64  --rate keep -v320 -q2 --copy-artwork -o $dest  $source
            extension: m4a
        wav: ffmpeg -i $source -y -acodec pcm_s16le $dest
        alac:
            command: ffmpeg -i $source -y -vn -acodec alac $dest
            extension: m4a
        flac: ffmpeg -i $source -y -vn -acodec flac $dest
        mp3: ffmpeg -i $source -y -vn -aq 2 $dest
        opus: ffmpeg -i $source -y -vn -acodec libopus -ab 96k $dest
        ogg: ffmpeg -i $source -y -vn -acodec libvorbis -aq 3 $dest
        wma: ffmpeg -i $source -y -vn -acodec wmav2 -vn $dest
    never_convert_lossy_files: yes
    auto: no
    tmpdir: none
    copy_album_art: no
    embed: yes
    id3v23: inherit
    quiet: no
    pretend: no
    threads: 4
    max_bitrate: 500

    paths: {}
    no_convert: ''
    album_art_maxwidth: 0

ui:
    terminal_width: 80
    length_diff_thresh: 10.0
    color: yes
    colors:
        text_success: green
        text_warning: yellow
        text_error: red
        text_highlight: red
        text_highlight_minor: lightgray
        action_default: turquoise
        action: blue

format_item: $artist - $album - $title
format_album: $albumartist - $album
time_format: '%Y-%m-%d %H:%M:%S'
format_raw_length: no

sort_album: albumartist+ album+
sort_item: artist+ album+ disc+ track+
sort_case_insensitive: yes
item_fields:
    has_date: 1 if len(str(year))==6 else 0

paths:
    playlist:AnimeSongs: AnimeSongs/$title%if{$artist, - $artist,}
    playlist:GalgameSongs: GalgameSongs/$title%if{$artist, - $artist,}
    playlist:AnimeBgm: AnimeBgm/$title
    playlist:GalgameBgm: GalgameBgm/$title
    playlist:VocaloidBgm: VocaloidBgm/$title
    playlist:ChineseSongs: ChineseSongs/$title%if{$artist, - $artist,}
    playlist:EnglishSongs: EnglishSongs/$title%if{$artist, - $artist,}
    playlist:JapaneseSongs: JapaneseSongs/$title%if{$artist, - $artist,}
    playlist:KoreanSongs: KoreanSongs/$title%if{$artist, - $artist,}
    playlist:VocaloidCN: VocaloidCN/$title%if{$artist, - $artist,}
    playlist:VocaloidJP: VocaloidJP/$title%if{$artist, - $artist,}
    playlist:AbsoluteMusic: AbsoluteMusic/$title%if{$artist, - $artist,}
    playlist:Karaoke: Karaoke/$title - $length
    playlist:Vtuber: Vtuber/$vtubername/$album/$title%if{$has_date,[$year],}
    playlist:singer: Singer/$singername/$album/$title
    playlist:LoveLive: LoveLive/$llgroupname/$album/$title
    default: Album/$album%aunique{}/$track $title
    singleton: singleton/$title
    comp: Compilations/$album%aunique{}/$track $title

statefile: state.pickle

musicbrainz:
    host: musicbrainz.org
    ratelimit: 1
    ratelimit_interval: 1.0
    searchlimit: 5

match:
    strong_rec_thresh: 0.04
    medium_rec_thresh: 0.25
    rec_gap_thresh: 0.25
    max_rec:
        missing_tracks: medium
        unmatched_tracks: medium
    distance_weights:
        source: 2.0
        artist: 3.0
        album: 3.0
        media: 1.0
        mediums: 1.0
        year: 1.0
        country: 0.5
        label: 0.5
        catalognum: 0.5
        albumdisambig: 0.5
        album_id: 5.0
        tracks: 2.0
        missing_tracks: 0.9
        unmatched_tracks: 0.6
        track_title: 3.0
        track_artist: 2.0
        track_index: 1.0
        track_length: 2.0
        track_id: 5.0
    preferred:
        countries: []
        media: []
        original_year: no
    ignored: []
    required: []
    ignored_media: []
    ignore_data_tracks: yes
    ignore_video_tracks: yes
    track_length_grace: 10
    track_length_max: 30
fetchart:
    auto: yes
    minwidth: 0
    maxwidth: 0
    enforce_ratio: no
    cautious: no
    cover_names:
    - cover
    - front
    - art
    - album
    - folder
    sources:
    - filesystem
    - coverart
    - itunes
    - amazon
    - albumart
    google_key: REDACTED
    google_engine: 001442825323518660753:hrh5ch1gjzm
    fanarttv_key: REDACTED
    store_source: no
lastgenre:
    whitelist: yes
    min_weight: 10
    count: 1
    fallback:
    canonical: no
    source: album
    force: yes
    auto: yes
    separator: ', '
    prefer_specific: no
web:
    host: 127.0.0.1
    port: 8337
    cors: ''
    cors_supports_credentials: no
    reverse_proxy: no
    include_paths: no
pathfields: {}
album_fields: {}
mudssky commented 4 years ago

maybe we need a configuration to set utf-8 encode

sampsyo commented 4 years ago

Hi! There is little that beets can do about this itself… your system configuration has specified that you're using gbk for argument encodings, but you are trying to use a filename that does not fit that encoding as an argument. Perhaps you want to consider changing your terminal's locale settings to a UTF-8 locale?

For example, export LC_ALL=<lang>_<country>.UTF-8.

Even so, we should probably avoid a crash here, although I'm not entirely sure how. Maybe by just catching the error and printing a more useful message?

ghost commented 4 years ago

I received a similar error.

Problem

Running beet import on a directory containing a file with a special/unrecognized character causes the import to fail when the convert plugin is enabled.

The error does not occur if the file is renamed to something else (like "abc.flac") before importing.

Sending event: write
zero: images:  -> None
Sending event: after_write
Sending event: database_change
convert: Encoding /root/temp/112 - Soleil de Nuit (mit Pierre Maubouch).flac
Traceback (most recent call last):
  File "/usr/local/bin/beet", line 11, in <module>
    load_entry_point('beets==1.4.9', 'console_scripts', 'beet')()
  File "/usr/local/lib/python3.7/site-packages/beets/ui/__init__.py", line 1266, in main
    _raw_main(args)
  File "/usr/local/lib/python3.7/site-packages/beets/ui/__init__.py", line 1253, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/usr/local/lib/python3.7/site-packages/beets/ui/commands.py", line 955, in import_func
    import_files(lib, paths, query)
  File "/usr/local/lib/python3.7/site-packages/beets/ui/commands.py", line 925, in import_files
    session.run()
  File "/usr/local/lib/python3.7/site-packages/beets/importer.py", line 329, in run
    pl.run_parallel(QUEUE_SIZE)
  File "/usr/local/lib/python3.7/site-packages/beets/util/pipeline.py", line 445, in run_parallel
    six.reraise(exc_info[0], exc_info[1], exc_info[2])
  File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/beets/util/pipeline.py", line 312, in run
    out = self.coro.send(msg)
  File "/usr/local/lib/python3.7/site-packages/beets/util/pipeline.py", line 194, in coro
    func(*(args + (task,)))
  File "/usr/local/lib/python3.7/site-packages/beets/importer.py", line 1511, in plugin_stage
    func(session, task)
  File "/usr/local/lib/python3.7/site-packages/beets/plugins.py", line 143, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/beetsplug/convert.py", line 177, in auto_convert
    self.convert_on_import(config.lib, item)
  File "/usr/local/lib/python3.7/site-packages/beetsplug/convert.py", line 479, in convert_on_import
    self.encode(command, item.path, dest)
  File "/usr/local/lib/python3.7/site-packages/beetsplug/convert.py", line 223, in encode
    encode_cmd.append(args[i].encode(util.arg_encoding()))
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 145: surrogates not allowed

An archive with the file in question inside is here

Setup

Output of locale command is:

root@music:~ # locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8

My configuration (output of beet config) is:

directory: /root/music/managed
art_filename: folder

ui:
    color: yes

import:
    move: yes
    timid: yes
    languages: en
    bell: yes

match:
    strong_rec_thresh: 0.02

plugins: chroma convert discogs embedart ftintitle info replaygain scrub zero
chroma:
    auto: yes
convert:
    auto: yes
    album_art_maxwidth: 800
    dest: /root/music/conversions_backup
    embed: no
    never_convert_lossy_files: yes
    format: flac
    formats:
        flac:
            command: ffmpeg -i $source -y -vn -acodec flac -compression_level 12 $dest
            extension: flac
        aac:
            command: ffmpeg -i $source -y -vn -acodec aac -aq 1 $dest
            extension: m4a
        alac:
            command: ffmpeg -i $source -y -vn -acodec alac $dest
            extension: m4a
        mp3: ffmpeg -i $source -y -vn -aq 2 $dest
        opus: ffmpeg -i $source -y -vn -acodec libopus -ab 96k $dest
        ogg: ffmpeg -i $source -y -vn -acodec libvorbis -aq 3 $dest
        wma: ffmpeg -i $source -y -vn -acodec wmav2 -vn $dest
    pretend: no
    threads: 4
    id3v23: inherit
    max_bitrate: 500
    tmpdir:
    quiet: no

    paths: {}
    no_convert: ''
    copy_album_art: no
embedart:
    auto: no
    maxwidth: 0
    compare_threshold: 0
    ifempty: no
    remove_art_file: no
ftintitle:
    auto: yes
    format: (feat. {0})
    drop: no
replaygain:
    auto: yes
    backend: gstreamer
    overwrite: yes
    noclip: yes
    targetlevel: 89
    r128: [Opus]
scrub:
    auto: yes
zero:
    auto: yes
    fields: images
    keep_fields: []
    update_database: no
discogs:
    apikey: REDACTED
    apisecret: REDACTED
    tokenfile: discogs_token.json
    source_weight: 0.5
    user_token: REDACTED
jackwilsdon commented 4 years ago

U+DCE9 isn't a valid UTF-8 character, so I don't think there's much we can really do here - the filename needs amending manually to remove this character.

Kidsnd274 commented 7 months ago

Hi is there any solution to this? I am also having the same problem when importing a file with unicode characters.

O:\Test\りりあ。\import test>beet import "01 - 貴方の側に。.flac"
error: no such file or directory: 01 - ??????.flac

And when running this command to import all files in the directory, I get this error:

O:\Test\りりあ。\import test>beet import .

O:\Test\りりあ。\import test (1 items)
Tagging:
    りりあ。 - 貴方の側に。
URL:
    https://musicbrainz.org/release/2c62c778-211d-478f-b674-efd5701939b1
(Similarity: 100.0%) (Digital Media, 2023, JP, TOY’S FACTORY, TFDS-00923)
convert: Encoding O:\Lidarr Test\りりあ。\import test\01 - 貴方の側に。.flac
Traceback (most recent call last):
  File "C:\Users\User\AppData\Roaming\Python\Python38\Scripts\beet-script.py", line 11, in <module>
    load_entry_point('beets==1.6.0', 'console_scripts', 'beet')()
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\ui\__init__.py", line 1285, in main
    _raw_main(args)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\ui\__init__.py", line 1272, in _raw_main    subcommand.func(lib, suboptions, subargs)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\ui\commands.py", line 973, in import_func
    import_files(lib, paths, query)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\ui\commands.py", line 943, in import_files
    session.run()
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\importer.py", line 340, in run
    pl.run_parallel(QUEUE_SIZE)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\util\pipeline.py", line 446, in run_parallel
    raise exc_info[1].with_traceback(exc_info[2])
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\util\pipeline.py", line 311, in run
    out = self.coro.send(msg)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\util\pipeline.py", line 193, in coro
    func(*(args + (task,)))
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\importer.py", line 1535, in plugin_stage    func(session, task)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\plugins.py", line 145, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beetsplug\convert.py", line 183, in auto_convert
    par_map(lambda item: self.convert_on_import(config.lib, item),
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beets\util\__init__.py", line 1061, in par_map    pool.map(transform, items)
  File "c:\program files\python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\program files\python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
  File "c:\program files\python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\program files\python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beetsplug\convert.py", line 183, in <lambda>
    par_map(lambda item: self.convert_on_import(config.lib, item),
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beetsplug\convert.py", line 513, in convert_on_import
    self.encode(command, item.path, dest)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\beetsplug\convert.py", line 216, in encode
    encode_cmd.append(args[i].encode(util.arg_encoding()))  #            encode_cmd.append(args[i].encode(util.arg_encoding()))
  File "c:\program files\python38\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 15-18: character maps to <undefined>