DocMarty84 / koozic

Self-hosted media streaming server
https://koozic.net
Other
90 stars 12 forks source link

Scaning folders Fails with non-ascii filenames #22

Closed DavidVentura closed 5 years ago

DavidVentura commented 5 years ago

When trying to scan something with non-ascii names (no idea what though, I deleted some files that were in japanese but it still breaks) the scanning will fail immediately:

Feb 05 21:59:06 koozic start.sh[496]: 2019-02-05 21:59:06,543 500 ERROR koozic odoo.sql_db: bad query: INSERT INTO "oomusic_track" ("id", "create_uid", "create_date", "write_uid", "write_date", "album_artist_id", "album_id", "artist_id", "bitrate", "composer", "contact", "copyright", "description", "disc", "dummy_field", "duration", "duration_min", "encoded_by", "folder_id", "genre_id", "last_modification", "name", "path", "performer_id", "root_folder_id", "size", "track_number", "track_number_int", "track_total", "user_id", "year") VALUES (nextval(%s), %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) RETURNING id
Feb 05 21:59:06 koozic start.sh[496]: ERROR: 'utf-8' codec can't encode character '\udcd0' in position 161: surrogates not allowed

You can get all the filenames from here

keep in mind the link will expire in 7 days

DavidVentura commented 5 years ago

Dies in exactly

2019-02-06 10:09:26,409 3896 DEBUG koozic odoo.addons.oomusic.models.oomusic_folder_scan: Scanning file "/storage/Media/Music/Arcade Fire - Discography 2001-2013 (By Jamal The Moroccan)/Albums/2004 - Funeral  [Japanese Special Limited Edition]/02. Neighborhood 2 (La\udcd0\udcbfka).mp3"

which when I do ls shows as

02. Neighborhood 2 (Laпka).mp3

(Notice the russian character п instead of the letter n)

DavidVentura commented 5 years ago

Which gets resolved by export LC_ALL="en_US.UTF-8" but there should be a way to detect this

DocMarty84 commented 5 years ago

Hum, that's most probably specific to your OS configuration, and the way PostgreSQL is using environment variables. I had to do the exact same in the Dockerfile[1]:

https://github.com/DocMarty84/koozic/blob/4a583e36db0566725463f413abda7aecc2cf183e/extra/docker/Dockerfile#L14

I'll have a look though, but for sure on a standard Ubuntu Server (16.04 and 18.04), this is working fine with the default configuration. Based on your Ansible file, I guess you are running Debian 9?

[1] I was using Debian 9 when I wrote it, I didn't update that part when I switched to Ubuntu 18.04

DavidVentura commented 5 years ago

The postgres db is in another host, so unlikely to be related. I guess this script is reading filenames incorrectly and feeding it to pg

DocMarty84 commented 5 years ago

Which OS are you using?

From https://stackoverflow.com/a/51833146

In Python3 all strings are unicode, so the problem you're having is likely due to your locale settings not being correct. The Python3 interpreter looks to use the locale environment variables and if it cannot find them it emulates basic ASCII

DavidVentura commented 5 years ago

I'm on debian. Would be nice to not let this fall to ascii though, enforce utf8