AzuraCast / AzuraCast

A self-hosted web radio management suite, including turnkey installer tools for the full radio software stack and a modern, easy-to-use web app to manage your stations.
https://www.azuracast.com/
GNU Affero General Public License v3.0
2.96k stars 550 forks source link

Problems with Non English characters #2010

Closed eranwo closed 4 years ago

eranwo commented 4 years ago

Using Docker installation method Yes

Host Operating System Ubuntu 18.04

Describe the bug When playing song with non English characters, it display not properly in api: "text": "\u05dc\u05d4\u05e7\u05ea \u05d4\u05e0\u05d7\u05f4\u05dc - \u05d4\u05e8\u05e2\u05d5\u05ea", "artist": "\u05dc\u05d4\u05e7\u05ea \u05d4\u05e0\u05d7\u05f4\u05dc", "title": "\u05d4\u05e8\u05e2\u05d5\u05ea", "album": "\u05d4\u05dc\u05d4\u05d9\u05d8\u05d9\u05dd \u05d4\u05d2\u05d3\u05d5\u05dc\u05d9\u05dd 1963-1972"

And also in song playback timeline vsv report:

26/09/2019 5:21pm 0 0 ֳ—ֲ”ֳ—ֲ¨ֳ—ֲ¢ֳ—ֲ•ֳ—ֲ× ֳ—œֳ—ֲ”ֳ—ֲ§ֳ—ֲ× ֳ—ֲ”ֳ—ֲ ֳ—ֲ—ֳ—ֲ´ֳ—œ
26/09/2019 5:24pm 0 0 ׳”׳¨׳¢׳•׳× ׳œ׳”׳§׳× ׳”׳ ׳—׳´׳œ
26/09/2019 5:28pm 0 0 Who Do You Love The Chainsmokers & 5 Seconds of Summer

But in web pages it display well (include in Icecast admin page)

I'v tryed to change "Edit Profile" "Advanced Character Set Encoding" but it not fixed the problem

Vaalyn commented 4 years ago

Would you mind providing us with a file that causes this issue?

I've tried to replicate the problem by taking the artist / title that you posted (\u05dc\u05d4\u05e7\u05ea \u05d4\u05e0\u05d7\u05f4\u05dc - \u05d4\u05e8\u05e2\u05d5\u05ea) converted those escaped unicode sequences back to their original characters and updated a song to those characters. Everywhere I looked after having that song playing the characters were correctly displayed (even in the API).

I suspect that the encoding of the ID3 tags of your file/files is the cause of this problem but I'd like to verify that first before jumping to conclusions.

eranwo commented 4 years ago

@Vaalyn Hi , thank you for the reply Please see the mp3 attached as it example for files that cause the problems with non English characters. https://we.tl/t-Yo27pOTHgg

        "text": "\u05dc\u05d4\u05e7\u05ea \u05d4\u05e0\u05d7\u05f4\u05dc - \u05d4\u05e8\u05e2\u05d5\u05ea",
        "artist": "\u05dc\u05d4\u05e7\u05ea \u05d4\u05e0\u05d7\u05f4\u05dc",
        "title": "\u05d4\u05e8\u05e2\u05d5\u05ea",
        "album": "\u05d4\u05dc\u05d4\u05d9\u05d8\u05d9\u05dd \u05d4\u05d2\u05d3\u05d5\u05dc\u05d9\u05dd 1963-1972", 
׳”׳¨׳¢׳•׳× ׳œ׳”׳§׳× ׳”׳ ׳—׳´׳œ default
eranwo commented 4 years ago

@Vaalyn Have you had a chance to look into it?

Vaalyn commented 4 years ago

@eranwo I've looked at the files %_id3v2_character_encoding% to see what encoding was used for the tags and found out that they are UTF-16. That encoding is definitely valid in general for ID3 tags for ID3 v2.3 and higher.

If I'm not mistaken then the library that we use to read/write ID3 tags in AzuraCast supports UTF-16 too but when reading them from a file it's currently always using UTF-8 as seen in this line here. I think this is the root of the issue.

Until that is fixed you can try changing the encoding to UTF-8.

eranwo commented 4 years ago

@Vaalyn Hi, How do you check the metadata id3v2_character _encoding? in which tool? i've switched the mp3 to UTF-8 but still have same issues. https://we.tl/t-LvhuxxjRNO

BusterNeece commented 4 years ago

@Vaalyn @eranwo This one's a little tricky, since the GetID3 library doesn't appear to have any way to try to auto-detect the encoding of the characters by itself, so you have to specify one encoding type for any file it processes.

We could theoretically set that as a per-station setting, but I strongly hesitate to add new advanced, rarely-used per-station settings to an already complex station profile form.

I wonder if there's any workaround we could apply to that situation.

eranwo commented 4 years ago

maybe this can help ? https://github.com/neitanod/forceutf8

BusterNeece commented 4 years ago

@eranwo Thank you, that put me on the right track actually. There's another project that's something of a "successor" to that, called portableutf8 which includes very handy cleanup and uniform encoding tools.

I've added the library into AzuraCast, and my local tests show that it does a much better job of handling the ID3 metadata in your test file. The API is also correctly returning the right text.

Please verify this on your end by updating and uploading a new track.

eranwo commented 4 years ago

I checked it ( after updateing and upload a new song ) but I'm afraid it didn't make any change.

BusterNeece commented 4 years ago

@eranwo Note that any fixes will only apply to newly uploaded music, so just to confirm...the new file you uploaded isn't showing up correctly locally?

Can you attach the new song?

eranwo commented 4 years ago

yes , i removed the song and uploaded it again i'm using this song https://we.tl/t-LvhuxxjRNO

BusterNeece commented 4 years ago

2019-10-03 01_11_12-https___azuracast local_api_nowplaying

@eranwo It's processing it perfectly normally on my end after the latest changes.

eranwo commented 4 years ago

strange, not in my case, i even uploaded to the demo site and it seems to be the same isssue 1 2

BusterNeece commented 4 years ago

@eranwo The "file name" component is normal; in order to improve compatibility with numerous filesystems, we strip out UTF-8 characters in filenames and replace them with ASCII substitutions.

As for the NowPlaying API response, what you're looking at is the raw escaped value of the JSON, which escapes UTF-8 characters correctly. Your JSON viewer should be able to properly interpret those strings and convert them back into UTF-8 strings.

eranwo commented 4 years ago

The main issue is the csv report

03/10/19 4:40am 1 -1 ׳”׳¨׳¢׳•׳× ׳œ׳”׳§׳× ׳”׳ ׳—׳´׳œ  
03/10/19 5:27am 0 0 ׳”׳¨׳¢׳•׳× Lehakat Hanachal  
03/10/19 5:27am 0 0 ׳”׳¨׳¢׳•׳× ׳œ׳”׳§׳× ׳”׳ ׳—׳´׳œ default
03/10/19 5:31am 0 0 ׳”׳¨׳¢׳•׳× Lehakat Hanachal default
03/10/19 5:34am 0 0 AzuraCast is Live! AzuraCast.com  
03/10/19 6:26am 0 0 ׳”׳¨׳¢׳•׳× ׳œ׳”׳§׳× ׳”׳ ׳—׳´׳œ default
             
Vaalyn commented 4 years ago

I can't replicate that issue with the song history csv export after updating to the newest version and then uploading that file and playing it.

What software are you using to view the csv file?

eranwo commented 4 years ago

Tried again now on latest version by open new station and add the song (attached) - the problem still exist

Vaalyn commented 4 years ago

I don't have a copy of MS Excel so I'm using LibreOffice to view the file, also I'm on a Mac (might be relevant, don't know). This is how it looks like when opening the csv with UTF-8 selected on the import screen:

image

~Can you send me your csv file? I'd like to try opening it in LibreOffice.~

Didn't notice you already attached it.

Vaalyn commented 4 years ago

Tried opening your attached csv with LibreOffice like this: image

And this is how it looks there: image

Vaalyn commented 4 years ago

I've done a brief Google search regarding utf-8 csv files and Excel. This seems to be an issue with Excel not recognizing file encodings so you have to explicitly import the file in Excel and select the utf-8 encoding. Can you try that?

eranwo commented 4 years ago

looks perfect in yours.. ( though the Custom Fields are very missing..) Maybe the solusion for that in order to make the reports more "standart" is to add a pdf export option? I have no "import" in my excel 2013... i'll try to install LibreOffice

Vaalyn commented 4 years ago

I've searched on how to import CSV files in Excel 2013 and it seems that you have to go to the Data tab and then use the From Text option. In the import wizzard there should be an option called File origin where you can select the encoding.

eranwo commented 4 years ago

image

eranwo commented 4 years ago

Nice .. Thank you !

Do you think it's possible to expand the report fields acording to the id3 tag? the data is allredy there just need to add more options to the selection box

Vaalyn commented 4 years ago

If the import is working for you now I'll close this issue. Can you confirm that it is working correctly?

Do you think it's possible to expand the report fields acording to the id3 tag? the data is allredy there just need to add more options to the selection box

Please refer to issue #2028 for further status reports on that topic. It has been tagged as an Enhancement so I think it will be possible in the future to add those to the report.

eranwo commented 4 years ago

Yes it works, not perfect but doing the job. Thanks.