beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.6k stars 1.8k forks source link

convert: Avoid an encoding error on Windows when in a non-Unicode locale #3128

Open RollingStar opened 5 years ago

RollingStar commented 5 years ago

Problem

Arg_encoding leads to a non-Unicode encoding being used. The paths need to be unicode to make Japanese characters, so they fail.

https://github.com/beetbox/beets/blob/master/beetsplug/convert.py#L222

       for i, arg in enumerate(args):
            args[i] = Template(arg).safe_substitute({
                'source': source,
                'dest': dest,
            })
            if six.PY2:
                encode_cmd.append(args[i])
            else:
                encode_cmd.append(args[i].encode(util.arg_encoding()))

https://github.com/beetbox/beets/blob/master/beets/util/__init__.py#L319

When I hardcode the try: block to "utf-8", the problem is fixed.

2019-10-28 edit: Hardcode arg_encoding to always return 'utf-8'.

https://github.com/beetbox/beets/blob/1b187fbf5345727e0dfdaea958a714f19e917a4e/beets/util/__init__.py#L328

I haven't set my default codepage to UTF8, but the process has side-effects on Windows so I don't know if that's what users should be expected to do.

My terminal can print special characters just fine, and beets has been able to write paths with special characters as well. This leads me to conclude that calling arg_encoding() isn't the right move here.

Running this command in verbose (-vv) mode:

λ  beet -vv convert --format mp3 -y albumartist:2814
[snip]
convert: Encoding e:\Music\2814 - 2015 - 新しい日の誕生 (2016)\01. 恢复.flac                                           
convert: Encoding e:\Music\2814 - 2015 - 新しい日の誕生 (2016)\02. 遠くの愛好家.flac                                       
the: "2814" -> "2814"                                                                                         
convert: Encoding e:\Music\2814 - 2015 - 新しい日の誕生 (2016)\03. 新宿ゴールデン街.flac                                     
convert: Encoding e:\Music\2814 - 2015 - 新しい日の誕生 (2016)\04. ふわっと.flac                                         
Traceback (most recent call last):                                                                            
  File "\AppData\Local\Programs\Python\Python37\Scripts\beet-script.py", line 11, in <module>    
    load_entry_point('beets==1.4.8', 'console_scripts', 'beet')()                                             
  File "c:\apps\cmder_mini\src\beets\beets\ui\__init__.py", line 1262, in main                                
    _raw_main(args)                                                                                           
  File "c:\apps\cmder_mini\src\beets\beets\ui\__init__.py", line 1249, in _raw_main                           
    subcommand.func(lib, suboptions, subargs)                                                                 
  File "c:\apps\cmder_mini\src\beets\beetsplug\convert.py", line 453, in convert_func                         
    pipe.run_parallel()                                                                                       
  File "c:\apps\cmder_mini\src\beets\beets\util\pipeline.py", line 445, in run_parallel                       
    six.reraise(exc_info[0], exc_info[1], exc_info[2])                                                        
  File "\AppData\Local\Programs\Python\Python37\lib\site-packages\six.py", line 693, in reraise  
    raise value                                                                                               
  File "c:\apps\cmder_mini\src\beets\beets\util\pipeline.py", line 358, in run                                
    self.coro.send(msg)                                                                                       
  File "c:\apps\cmder_mini\src\beets\beetsplug\convert.py", line 302, in convert_item                         
    self.encode(command, original, converted, pretend)                                                        
  File "c:\apps\cmder_mini\src\beets\beetsplug\convert.py", line 222, in encode                               
    encode_cmd.append(args[i].encode(util.arg_encoding()))                                                    
  File "\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 12, in encode     
    return codecs.charmap_encode(input,errors,encoding_table)                                                 
UnicodeEncodeError: 'charmap' codec can't encode characters in position 23-29: character maps to <undefined>  

Setup

sampsyo commented 5 years ago

Hmm... I’m not sure what to do here. We do need to use arg_encoding or something like it---this is the bit of the convert plugin that constructs the command to invoke, so we need to use the encoding that the system expects for command-line arguments.

On Unix, the right thing to do here would be to interpolate the raw bytes from the filename string. On Windows, filenames are proper Unicode---so, as far as I know, Windows should be reporting a Unicode argument encoding. I'm not sure why your system isn't (encodings on Windows are a great mystery to me!). Do you know what encoding it's try to use and---while this is a long shot---why it's doing that?

jackwilsdon commented 5 years ago

@sampsyo looks to me like util.arg_encoding() is returning cp1252 (see third line from the bottom in the stack trace).

I can confirm that on my Windows 10 machine, arg_encoding() returns cp1252 (the second item in the array returned by locale.getdefaultlocale() as of time of writing), which I assume isn't what we want/expected;

image


It seems that sys.getfilesystemencoding() returns utf-8 for me, but I feel like that's different to argument encoding.

sampsyo commented 5 years ago

Got it; thanks! That's not really what I expected, but I suppose what we need to do is to "pass through" filenames without trying to use the argument encoding for them at all. That will be tricky to do while still using template formatting… but maybe we can figure out a strategy?

RollingStar commented 5 years ago

I can confirm that on my Windows 10 machine, arg_encoding() returns cp1252 (the second item in the array returned by locale.getdefaultlocale() as of time of writing)

Same here.

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

RollingStar commented 4 years ago

Yes

stale[bot] commented 3 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

RollingStar commented 3 years ago

Got it; thanks! That's not really what I expected, but I suppose what we need to do is to "pass through" filenames without trying to use the argument encoding for them at all. That will be tricky to do while still using template formatting… but maybe we can figure out a strategy?

sampsyo commented 3 years ago

I'll mark this as a bug.