beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.78k stars 1.82k forks source link

convert: Character encoding/path issue on Windows #1166

Open mluds opened 9 years ago

mluds commented 9 years ago

The convert plugin seems to have issues with certain characters. Here is the cmd.exe output when importing:

C:\Users\Michael\Desktop\blink-182\blink-182>beet import .

C:\Users\Michael\Desktop\blinkΓÇÉ182\blinkΓÇÉ182 (14 items)
Tagging:
    blinkΓÇÉ182 - blinkΓÇÉ182
URL:
    http://musicbrainz.org/release/f0359d59-315e-45e6-bf7b-170754bac198
(Similarity: 100.0%) (CD, 2003, US, Geffen Records)

Everything works fine here but notice how the - becomes ΓÇÉ. Then when I try to convert one of the tracks:

C:\Users\Michael>beet -v convert -d . stockholm
user configuration: C:\Users\Michael\AppData\Roaming\beets\config.yaml
data directory: C:\Users\Michael\AppData\Roaming\beets
Sending event: pluginload
library database: C:\Users\Michael\Music\Library.blb
library directory: C:\Users\Michael\Music\Files
Sending event: library_opened
blinkΓÇÉ182 - blinkΓÇÉ182 - Stockholm Syndrome
Convert? (Y/n) y
Encoding C:\Users\Michael\Music\Files\blinkΓÇÉ182\blinkΓÇÉ182\05 Stockholm Syndrome.flac
Encoding C:\Users\Michael\Music\Files\blinkΓÇÉ182\blinkΓÇÉ182\05 Stockholm Syndrome.flac failed. Cleaning up...
Command ffmpeg -i C:\Users\Michael\Music\Files\blinkΓÇÉ182\blinkΓÇÉ182\05 Stockholm Syndrome.flac -y -vn -aq 2 C:\Users\Michael\blinkΓÇÉ182\blinkΓÇÉ182\05 Stock
holm Syndrome.mp3 exited with status 1
ffmpeg version 2.2.3 Copyright (c) 2000-2014 the FFmpeg developers
  built on Jun 19 2014 20:36:23 with gcc 4.8.3 (GCC)
  configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --ena
ble-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libi
lbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enabl
e-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --
enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-decklink --en
able-zlib
  libavutil      52. 66.100 / 52. 66.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  2.100 /  4.  2.100
  libswscale      2.  5.102 /  2.  5.102
  libswresample   0. 18.100 /  0. 18.100
  libpostproc    52.  3.100 / 52.  3.100
C:\Users\Michael\Music\Files\blink‐182\blink‐182\05 Stockholm Syndrome.flac: No such file or directory
Conversion failed!

Sending event: cli_exit

Notice how the - is ‐ in the ffmpeg command but ‐ in the output. This makes me think it has something to do with character encoding and subprocess.popen.

P.S. I only modified the source to display the ffmpeg output. Here's the git diff:

diff --git a/beets/util/__init__.py b/beets/util/__init__.py
index 38cecd7..0f8285c 100644
--- a/beets/util/__init__.py
+++ b/beets/util/__init__.py
@@ -665,6 +665,7 @@ def command_output(cmd, shell=False):
         raise subprocess.CalledProcessError(
             returncode=proc.returncode,
             cmd=' '.join(cmd),
+            output=stderr
         )
     return stdout

diff --git a/beetsplug/convert.py b/beetsplug/convert.py
index 0b87fb7..73a4664 100644
--- a/beetsplug/convert.py
+++ b/beetsplug/convert.py
@@ -116,6 +116,7 @@ def encode(command, source, dest, pretend=False):
             exc.cmd.decode('utf8', 'ignore'),
             exc.returncode,
         ))
+        log.debug(u'{0}'.format(exc.output.decode('utf-8')))
         util.remove(dest)
         util.prune_dirs(os.path.dirname(dest))
         raise
sampsyo commented 9 years ago

Thanks for reporting. There is indeed an issue with Unicode paths and command invocation on Windows. It appears to be impossible to pass Unicode-string arguments to command invocations in Python 2.x: http://bugs.python.org/issue1759845

I don't know a reliable way around this (specifically, I have not found a way to encode to bytestrings so that Windows recognizes the path). Hence this sad line in acoustid: https://github.com/sampsyo/pyacoustid/blob/master/acoustid.py#L285

Additional insight would of course be welcome!

It also looks like there might be some manner of terminal encoding issue in your shell (e.g., that first "blinkΓÇÉ182" in the output). Any Windows expertise for addressing that would be helpful.

mluds commented 9 years ago

Yep, it looks like since Python 2.x doesn't provide an interface to the CreateProcessW function, there's no way to pass Unicode characters to a new Windows process without a C extension.

As for the ΓÇÉ, it looks like it comes from the UTF-8 Hyphen bytes E2 80 90 when decoded as extended ASCII or cp437.

sampsyo commented 9 years ago

Thanks for looking into it. I think we're out of luck for now, sadly, although I'll leave this ticket open in case anyone finds something we're missing or for when we move to Python 3.

On the console output: it sounds like beets is outputting UTF-8 but cmd.exe (or the Windows terminal application? Not sure how this works) is interpreting it as ASCII. I also don't know how to fix this, since some people seem to have success with UTF-8 on Windows. Maybe they're using PowerShell?

vaab commented 7 years ago

Here's a workaround for python 2.7 on Windows to use CreateProcessW. Comments welcome.

http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/