dhowden / tag

ID3, MP4 and OGG/FLAC metadata parsing in Go
BSD 2-Clause "Simplified" License
568 stars 77 forks source link

Tag not recognized #20

Closed deluan closed 8 years ago

deluan commented 8 years ago

This is another file from my library whose tag is not recognized: https://goo.gl/ov75zy

$ tag 04\ The\ Man\ With\ X-Ray\ Eyes.mp3
error reading file: invalid encoding byte 20

This file is recognized normally in a number of other tag tools. I didn't have time to dig into tag's code to figure out what is going on, but I hope the file sample can help. Thanks for this awesome library.

My environment: Mac OS X 10.11.1 Go 1.6

wader commented 8 years ago

Fails on the TCON frame that looks a bit strange:

T  C  O  N                       (  1  2  )  O  t  h  e  r
54 43 4F 4E 00 00 00 0A 00 00 20 28 31 32 29 4F 74 68 65 72

len 10 flags 0 encoding 32 (0x20 hmm white space?)

@deluan which other tools can parse it? would be nice to see how they handle it... just fallbacks back to UTF-8 or is there something else to this?

deluan commented 8 years ago

iTunes 12, kid3 and Subsonic (this one uses JAuditagger java library)

wader commented 8 years ago

@deluan Thanks. Do you know which program created the tag?

deluan commented 8 years ago

I do not, unfortunately....

wader commented 8 years ago

Did some experiments:

OSX finder -> "Other"

mutagen old version -> " (12)Other" check if encoding < 16 fallbacks to latin https://github.com/Davideddu/mutagen/blob/master/id3.py#L695

mutagen new version -> throws junk frame exception and skips it

vlc -> show up as "⠱㈩佴桥" in media info UI ... looking at the taglib code (which i thunk vlc uses) it seems to fallback to utf16 which it just copies (it uses utf16 internally)

https://github.com/taglib/taglib/blob/master/taglib/toolkit/tstring.cpp#L353

My proposal is to fallback to ISO-8859-1

dhowden commented 8 years ago

Fixed by #25