creeperyang / id3-parser

A pure JavaScript id3 tag parser.
54 stars 13 forks source link

encoding type 0 is ISO-8859, not UTF8 #9

Closed Satyam closed 8 years ago

Satyam commented 8 years ago

Decoding ISO-8859 characters as UTF8 works fine for characters up to 0x80 but fails on the bottom part of of the table: https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout because it assumes that the full ISO-8859 character is just the fist part of a two character UTF8 and tries to merge its bits with those of the next character. ISO-8859 characters just need to be converted via fromCharCode

creeperyang commented 8 years ago

LGTM.

I will have a look as soon. And it's really nice if you add a test case.

Satyam commented 8 years ago

The problem with such a test suite is that real-life mp3's are copyrighted so I cannot include them. I tried doing short recordings and adding tags, but I was unable to choose the encoding and they ended up being UTF8. Building a fake test file would prove nothing, as I would naturally make them to match.

I've run it over my music library (a few thousand items) comparing the results of ffmpeg/ffprobe and id3-parser. There were mismatches with some artists, such as Carlos Núñez, and many other Europeans whose names contain diacritical marks and now they match. As good a test as that is, I'm afraid I cannot put the sample data on-line.

creeperyang commented 8 years ago

Umm, you're right. It's hard to supply test case.

When iso-8859-1 encoding, covert as utf8 is not correct (0x80-0xff will be handled incorrectly).

And thanks for your pr.