NCrusher74 / SwiftTaggerID3

A Swift library for reading and editing ID3 tags
Apache License 2.0
5 stars 2 forks source link

Exporting #22

Open NCrusher74 opened 3 years ago

NCrusher74 commented 3 years ago

One of my tests is failing. This is the message I'm getting:

XCTAssertNoThrow failed: throwing "*** -[NSRegularExpression enumerateMatchesInString:options:range:usingBlock:]: Range or index out of bounds"

I assume it refers to one of these two extensions of string, the second of which is an all-caps variation on this.

    func convertCamelCase() -> String {
        return self
            .replacingOccurrences(of: " ", with: "")
            .replacingOccurrences(of: "([A-Z])",
                                  with: " $1",
                                  options: .regularExpression,
                                  range: range(of: self))
            .capitalized
    }

    func convertCamelToUpperCase() -> String {
        return self
            .replacingOccurrences(of: " ", with: "")
            .replacingOccurrences(of: "([A-Z])",
                                  with: " $1",
                                  options: .regularExpression,
                                  range: range(of: self))
            .uppercased()
    }

The context for this is that I'm trying to convert my frame keys to a human-readable string that can be exported to a text file to display all the metadata, but a string which can be easily parsed if I import metadata from that same text file, like this:

output.txt

The parts on the right are the values of the actual frames (some of which have been massaged via CustomStringConvertible to make them readable.

The parts to the left side are a version of the frameKey that is also massaged to be human-readable. Since FrameKey cannot conform to String (on account of some frame keys having associated values) I implemented a stringValue property for them:

    var stringValue: String {
        switch self {
            case .album: return "album"
            case .albumSort: return "albumSort"
            case .albumArtist: return "albumArtist"
            // ...

And then I convert that stringValue to all caps and pretty it up a bit:

    var upperCasedStringValue: String {
        self.stringValue.convertCamelToUpperCase()
        .replacingOccurrences(of: " I D", with: " ID")
        .replacingOccurrences(of: "U R L", with: "URL")
        .replacingOccurrences(of: "  ", with: " ")
    }

And combine it with the frame identifier to create the string seen in the file above:

    func keyString(format: MetadataExportFormat) -> String {
        let idString: String
        let id = FrameIdentifier(frameKey: self)

        if let string = Version.v2_4.idString(id) {
            idString = string
        } else if let string = Version.v2_3.idString(id) {
            idString = string
        } else if let string = Version.v2_2.idString(id) {
            idString = string
        } else {
            idString = "UNKN"
        }

        switch format {
            case .text: return "{\(idString)} " + upperCasedStringValue
            default: return idString
        }
    }

Upon import of a text file formatted like this, the part in the brackets is really all I'm concerned with, except when it comes to frames that have an associated language and/or description as part of their frame key. Which is where my problem is.

            case .comments(language: let language, description: let description):
                if description != "" && language != .und {
                    return "[\(language.rawValue)] \(description.camelCased)"
                } else if description != "" && language == .und {
                    return description.camelCased
                } else if description == "" && language != .und {
                    return "[\(language.rawValue)]"
                } else {
                    return "Comment"
                }

Because the language and description values are optional in the frame itself, but I don't want to make them optional in the FrameKey enum, I assign them a default value of .und and "". So when trying to create a human-readable string out of these, I have four different possibilities:

1) There's no language or description, so I don't need a human-readable string for them, and I just substitute in the default "Comment" descriptor. 2) There's a language but no description 3) There's a description but no language 4) Both are present

In order to parse this information from an imported file, I have to have some way of differentiating whether the string is a language raw value, or whether it's a description, so I put the language in square brackets.

I suspect those are what is breaking my extension, but I'm not sure what I should do about it.

instead of converting the stringValue for that case to upper case after the fact, I guess I could do it in a case-by-case basis to avoid running the square brackets through my extension, but that seems really inefficient:

            case .bpm: return "bpm".convertCamelToUpperCase()
            case .comments(language: let language, description: let description):
                if description != "" && language != .und {
                    return "[\(language.rawValue.uppercased())] \(description.uppercased())"
                } else if description != "" && language == .und {
                    return description.uppercased()
                } else if description == "" && language != .und {
                    return "[\(language.rawValue.uppercased())]"
                } else {
                    return "Comment".uppercased()
                }
            case .compilation: return "compilation".convertCamelToUpperCase()

There has to be a more efficient way than what I'm doing?

SDGGiesbrecht commented 3 years ago

XCTAssertNoThrow failed: throwing "*** -[NSRegularExpression enumerateMatchesInString:options:range:usingBlock:]: Range or index out of bounds"

“Index out of bounds” means you have fed it a range that does not entirely exist (or else the platforms own method has a bug).

If you have correctly identified where it is happening, then range(of: self) is producing an invalid range, which you are then using as the bounds of the regular expression search, causing it to reach for things that do not exist.

Note that self is unlikely to be equal to self.replacingOccurrences(of: " ", with: "")—their lengths will differ according to the number of spaces. It looks like you are asking for the range from the first but applying it to the second. Since the first is likely longer than the second, it should be no surprise that doing so ends up reaching out of the second’s bounds.

NCrusher74 commented 3 years ago

Ah, that makes sense. Since the "description" strings for the Comment/Lyrics/User-Defined frames are unlikely to be in camel case already, I was trying to remove any spaces and get it at least part way to camel case before running it through that function, but I didn't think about where in the process I was removing those spaces.

SDGGiesbrecht commented 3 years ago

Ergo:

let withoutSpaces = self.replacingOccurrences(of: " ", with: "")
return withoutSpaces // *
  .replacingOccurrences(of: "([A-Z])",
  with: " $1",
  options: .regularExpression,
  range: range(of: withoutSpaces) // *
).capitalized
SDGGiesbrecht commented 3 years ago

Actually no, while it was the simplest fix for the index error, it would still be nonsensical, due to eliminating the spaces the next step looks for. So you’re right, you will have to rearrange or rethink it.

SDGGiesbrecht commented 3 years ago

Never mind, the space is in the replacement, not the search pattern. I am clearly not thinking straight right now.

NCrusher74 commented 3 years ago

LOL, no problem. You've seen me have plenty of days like that.

While I'm at it, am I wrong in thinking this extension won't work if the letters are Roman-alphabet characters? If a letter is a Unicode character not in "([A-Z])" it won't get handled.

The thread I got that extension from had some other solutions which seemed to involve handling Unicode characters, but it seemed like there was some debate -- which went over my head, though they seemed to be centered around which approach was faster? -- as to which was the right approach.

NCrusher74 commented 3 years ago

While I haven't decided what to do about the possibility of Unicode characters yet, I did decide to go with a slightly more flexible approach that can be used whether or not I want to remove the spaces from a string before converting it.

    var condensed: String {
        self.replacingOccurrences(of: " ", with: "")
    }

    func titleCased(condensingSpaces: Bool = true) -> String {
        var string =  self
        if condensingSpaces {
            string =  self.condensed
        }

        return string
            .replacingOccurrences(of: "([A-Z])",
                                  with: " $1",
                                  options: .regularExpression,
                                  range: range(of: self))
            .capitalized
    }

    func camelToUpperCase(condensingSpaces: Bool = true) -> String {
        self.titleCased(condensingSpaces: condensingSpaces).uppercased()
    }
SDGGiesbrecht commented 3 years ago

It is not so much a question of getting it “right”, as getting it to consistently do what you want. Unicode transcends language barriers, but you don’t have to leave English in order to see that casing conversions are impossible for a machine to get perfect: “iPhone” → “I Phone” or “iPhone”?

NCrusher74 commented 3 years ago

Yeah, that is why I have to keep putting in these little case-by-case fixes as they pop up, and I'm sure I'll miss one somewhere.

    var upperCasedStringValue: String {
        self.stringValue.titleCased().uppercased()
        .replacingOccurrences(of: " I D", with: " ID")
        .replacingOccurrences(of: "U R L", with: "URL")
        .replacingOccurrences(of: "  ", with: " ")
    }