Closed sublipri closed 8 months ago
Thanks!
This definitely isn't the only place the ID3v2 spec falls short, so I'm not surprised something like this came up.
The fix should just be a simple:
if let Some(stripped) = text.strip_suffix(encoding.null_terminator()) {
text.truncate(stripped.len())
}
anytime we parse a text frame.
Looked into this some more and discovered a lot of Latin1 encoded frames with null terminators as well, including quite a few that have multiple trailing nulls. The fields with multiple nulls were mostly iTunes-specific, but it would be nice to handle them too.
Would it be appropriate to add something like this to line 130 of the decode_text function (and change read_string to be mutable)?
if read_string.ends_with('\0') {
let new_len = read_string.trim_end_matches('\0').len();
read_string.truncate(new_len);
That seems to fix the issue for me. I could try putting together a pull request including a test if you like. I couldn't find reference to encoding.null_terminator
anywhere - would just using the NUL char be acceptable?
Would it be appropriate to add something like this to line 130 of the decode_text function
I think the check should only be done in some helper function in id3::v2
, and used in TextInformationFrame::parse
/LanguageFrame::parse
/etc, since this is only an ID3v2 issue (I'd imagine).
I could try putting together a pull request including a test if you like
That'd be great, otherwise I'll need a few days to get to it at least.
I couldn't find reference to encoding.null_terminator anywhere
Oops, that was something I added in a local branch awhile ago. It was never added here 😄. Since there can be multiple terminators it doesn't matter anyway. trim_end_matches('\0')
works better in this case.
I think the check should only be done in some helper function in
id3::v2
, and used inTextInformationFrame::parse
/LanguageFrame::parse
/etc, since this is only an ID3v2 issue (I'd imagine).
I found a few FLAC files with VorbisComments containing trailing nulls too (most seem to be written by old versions of JRiver). Not sure if that really matters, but I think it'd make sense to put the helper function in utils::text
in case it is useful elsewhere. That would also fit with how things are currently structured - thedecode_text
function is only used by id3::v2
.
I'll try and get it done in the next couple of days.
Oh yeah, if this is an issue in other formats then it should be in utils::text
. I've never seen terminators in Vorbis Comments, but it wouldnt hurt to skip them if they exist in the wild.
Thanks a lot for working on this as well!
iTunes and applications that use Mutagen will add null terminators to UTF-8 encoded frames due to ambiguities in the spec. See this issue for some background:
https://github.com/quodlibet/mutagen/issues/379
This causes lofty to create an extra empty value when splitting multiple values.
Given how widespread a practise this is, it seems necessary to handle it when reading the content of a frame. I'm not sure exactly which frames can have null terminators, but I've noticed it in Text, ExtendedText, and Comment frames.
Here's an example file:
https://archive.org/download/The_Four_Seasons_Vivaldi-10361/John_Harrison_with_the_Wichita_State_University_Chamber_Players_-_05_-_Summer_Mvt_2_Adagio.mp3