Serial-ATA / lofty-rs

Audio metadata library
Apache License 2.0
187 stars 35 forks source link

ID3v2: Handle genre IDs when parsing TCON frames #281

Closed sublipri closed 11 months ago

sublipri commented 11 months ago

The ID3v2.2.0 and 2.3.0 standards allow for ID3v1 style genre IDs to be stored alongside of textual genres by surrounding them with brackets.

https://web.archive.org/web/20220609015114/https://id3.org/id3v2.3.0#line-282

References to the ID3v1 genres can be made by, as first byte, enter "(" followed by a number from the genres list (appendix A) and ended with a ")" character. This is optionally followed by a refinement, e.g. "(21)" or "(4)Eurodisco". Several references can be made in the same frame, e.g. "(51)(39)". If the refinement should begin with a "(" character it should be replaced with "((", e.g. "((I can figure out any genre)" or "(55)((I think...)".

They also define two textual IDs: RX => Remix and CR => Cover.

I've also encountered ID3v2.4.0 tags that store genres like this, including some that just have a numeric ID without brackets.

Ideally we should handle all theses cases. Not sure of the best approach - perhaps we could add a case for TCON frames in https://github.com/Serial-ATA/lofty-rs/blob/486cba1f040ec48bcba11f70fac18fb030356c35/src/id3/v2/frame/content.rs#L16s and use a function that would e.g. turn the value (4)Eurodisco into Disco\0Eurodisco?

"TCON" => handle_tcon(TextInformationFrame::parse(reader, version)?).map(FrameValue::Text),

handle_tcon would create a new frame if necessary, otherwise return the original frame.

Here's the logic from my application code that handles all the cases I've encountered without issue. Something using regex might be more robust but requires adding a dependency. If it seems like a reasonable approach I'd be happy to write some tests and adapt it to work with lofty.

fn handle_tcon(tcon: &str) -> String {
    let mut genres = Vec::new();
    if tcon.starts_with('(') && tcon.contains(")((") {
        todo!();
    }
    else if tcon.starts_with('(') && tcon.contains(')') {
        for value in tcon.split(')') {
            let value = value.strip_prefix('(');
            let genre = match value {
                "RX" => "Remix",
                "CR" => "Cover",
                "" => continue,
                _ => parse_genre(&value),
            };
            genres.push(genre)
        }
    } else {
        genres.push(parse_genre(tcon));
    }
    genres.join("\0")
}

fn parse_genre(genre: &str) -> &str {
    if let Ok(id) = genre.parse::<usize>() {
        if id < GENRES.len() {
            GENRES[id]
        } else {
            genre
        }
    } else {
        genre
    }
}
Serial-ATA commented 11 months ago

Ah, yeah. Kept forgetting to implement this.

I think the best way to go about this is to continue parsing TCON as a normal text frame and instead add a method to Id3v2Tag (Id3v2Tag::genres()?) to parse it into a Vec<String>, since one long null-concatinated string is likely to be split up anyway.

I've also encountered ID3v2.4.0 tags that store genres like this, including some that just have a numeric ID without brackets.

Pretty sure ID3v2.4 allows you to just have a number, so that'll probably add some complexity if we need to check for both cases. That's unfortunate.

sublipri commented 11 months ago

The reason I thought to normalize values to a null-concatenated string was that it might play nice with stuff like get_texts and converting an Id3v2Tag into a Tag. I haven't fully wrapped my head around how that all works though so an Id3v2Tag::genres() method sounds good if you think that's a better approach. Would you like me to try implementing it or should I leave it for you?

Serial-ATA commented 11 months ago

I likely won't get around to implementing this soon. I'd like to start focusing on some of the older issues and my draft PRs.

If you'd like to give this a shot, be my guest! 🙂