NCrusher74 / SwiftTaggerID3

A Swift library for reading and editing ID3 tags
Apache License 2.0
5 stars 2 forks source link

Question #4

Closed NCrusher74 closed 3 years ago

NCrusher74 commented 4 years ago

When taking breaks from trying to puzzle out the stuff that isn't working on this project, I've been working on making a mp4-tagging complement to this one. Frankly, AVFoundation does 95% of what I need to do for MP4 files, and I just want to streamline it so that it integrates with this ID3 tagging library with minimal effort. That way, I don't have to use MP42Foundation, which is bulky and has a lot of stuff I don't need.

But I'm encountering the weirdest issue. Code that works in a playground (where I was running my initial trial efforts) isn't working when I try to run it from the test module, even though as far as I can tell, it's the exact same code, accessing the exact same file.

Playground version:

let mp4url = URL(fileURLWithPath: "/Users/nolainecrusher/Downloads/audiobook_tools/sampleaax/mp4-full-meta.m4a")
let asset = AVAsset(url: mp4url)
let formatsKey = "availableMetadataFormats"

asset.loadValuesAsynchronously(forKeys: [formatsKey]) {
    var error: NSError? = nil
    let status = asset.statusOfValue(forKey: formatsKey, error: &error)
    if status == .loaded {
        for format in asset.availableMetadataFormats {
            let metadata = asset.metadata(forFormat: format)
            print(metadata)
        }
    }
}

This successfully prints an array of all the metadata items in the file.

This, however, does not:

    func testExample() throws {
       let mp4Url = URL(fileURLWithPath: "/Users/nolainecrusher/Downloads/audiobook_tools/sampleaax/mp4-full-meta.m4a")
        let asset = AVAsset(url: mp4Url)
        let formatsKey = "availableMetadataFormats"

        asset.loadValuesAsynchronously(forKeys: [formatsKey]) {
            var error: NSError? = nil
            let status = asset.statusOfValue(forKey: formatsKey, error: &error)
            if status == .loaded {
                for format in asset.availableMetadataFormats {
                    let metadata = asset.metadata(forFormat: format)
                    print(metadata)
                }
            }
        }
    }

In fact, if I cut and paste the textExample() function INTO a playground and call it, it will work. But it's not working when used in the test module and I don't understand why.

SDGGiesbrecht commented 4 years ago

Sorry. I’m falling behind on where all this is at. I’m very busy at the moment, but I’ll get back to you when I have the chance.

NCrusher74 commented 4 years ago

No problem. I understand completely. Good luck!

NCrusher74 commented 4 years ago

Nevermind, I managed to find a way around this one. While I still don't understand why the exact same code was working in one place but not in the other, I found a workaround:

        let mp4Url = URL(fileURLWithPath: "/Users/nolainecrusher/Downloads/audiobook_tools/sampleaax/mp4-full-meta.m4a")
        let asset = AVAsset(url: mp4Url)
        let metadata = asset.metadata(forFormat: .iTunesMetadata)
        print(metadata)
NCrusher74 commented 3 years ago

Question:

I'm reworking some things in the ID3 tagger because I'm a better coder now after writing (and rewriting) the MP4 tagger than I was when I wrote this.

For initializers that would require a lot of complex calculations or other code that I don't really want to keep IN the initializer, as far as I know, I have two options:

One would be to make it a private, static function in the struct or class itself, like this one you wrote long ago (which actually isn't that complex, but I have others that are far more convoluted.)

    static func extractDescriptionAndContent(
        from frameData: inout Data.SubSequence,
        encoding: StringEncoding
    ) throws -> (description: String?, content: String) {
        let description = frameData.extractPrefixAsStringUntilNullTermination(encoding)
        let content = frameData.extractPrefixAsStringUntilNullTermination(encoding) ?? ""
        return (description: description, content: content)
    }

The other is making an extension to handle them:

extension Data {
    mutating func extractDescriptionAndContent(encoding: String.Encoding) -> (description: String?, content: String) {
        let description = self.extractNullTerminatedString(encoding)
        let content = self.extractNullTerminatedString(encoding) ?? ""
        return (description: description, content: content)
    }
}

Is there any particular reason why one approach is preferable to another, or why either or both aren't a good idea?

SDGGiesbrecht commented 3 years ago

Both methods should result in the exact same machine instructions after compilation. The choice of which way to write a particular method has more to do with organizing the code so you can find it again later.

NCrusher74 commented 3 years ago

I find myself preferring to use extensions, because it just seems a slightly less laborious way to write things in-line:

LocalizedFrame.extractDescriptionAndContent(from: &parsing, encoding: encoding) vs. data.extractDescriptionAndContent(encoding: encoding)

But the locating them after the fact is a good point. I need to figure out what the best way to approach it on that angle would be.

NCrusher74 commented 3 years ago

Out of curiosity, and apropos of absolutely nothing, how are we feeling about the new Xcode?

SDGGiesbrecht commented 3 years ago

Don’t know. The tech know‐it‐alls like to brag about bigger screens, but then they just make the fonts bigger so you see less. Several annoying bugs and crashes have been resolved, but new ones have replaced them. Tabs seem to be permanently loaded and never freed from memory, even if after restarting, but the overall pinwheel time seems to have been reduced.

Mostly I’ve been preoccupied with getting Swift 5.3 to work on Windows, which has a tendency to remind me that Xcode isn’t all that bad after all.

NCrusher74 commented 3 years ago

Yeah, the jury is still out for me too, though of course I don't have the tech knowledge you do to analyze it so my concerns tend to be more aesthetic.

A week ago I would have sworn I'd pay anything for XCode's auto-complete and completion suggestions to be more responsive when working with SPM packages, but now they keep popping up in my way and they look different so they distract me from what I'm looking at, so I'm like "but I didn't want it to be THIS responsive!"

And why on earth did they get rid of the upper-right button that would enable you to see the console with a click?

And ugh. Yeah. When I finish working on my audiobooks suite for MacOS and iOS, I'm hoping to create a Windows/Android version, and while I'm happy that Swift CAN be used in Windows, I'm also dreading it because I'm sure it will come with another learning curve.

I'm eyeballing RemObject's Water for the IDE on that side of things, but I'm not sure yet.

Meanwhile, I'm also cursing Google for not having any useful suggestions on how to create an XCode theme that will automatically shift from light to dark mode with the rest of the system. I think some of XCode's native themes will do it, but of course those are the themes I don't like and something about the new look on XCode has made me need the customized theme I've been using even more.

NCrusher74 commented 3 years ago

Okay, new question:

I'm trying to set things up so that my libraries will prepend the copyright/sound recording/production copyright (P in a circle) symbols to the strings entered for those items. My MP4 library seems to be handling it okay, but ID3 isn't and I'm not sure why. I assume it has something to do with the encoding, possibly?

this passes: XCTAssertEqual(output.copyright, "\u{00A9} 2020 Copyright")

but this does not: XCTAssertEqual(output.producedNotice, "\u{2117} 2020 Produced Notice") (XCTAssertEqual failed: ("Optional("")") is not equal to ("Optional("℗ 2020 Produced Notice")"))

These are my helper functions and accessors:

    /// Instantiates parsing operation to retrieve a frame's contents from a `Tag`
    /// Parameter frameKey: The unique identifier of the frame
    /// Returns: The frame's contents as a human-readable, unterminated string
    func get(_ identifier: FrameIdentifier) -> String? {
        // check that the frame is a String Frame
        if let frame = self.frames[identifier.frameKey] as? StringFrame {
            // get the contentString from the frame data
            return frame.stringValue
        } else {
            return nil
        }
    }

    /// Adds the contents of a frame to a `Tag` instance
    /// Parameters:
    ///   layout: The frame's layout identifier, necessary to initialize the frame
    ///   frameKey: The frame's unique identifier
    ///   string: The string content input by the user
    mutating func set(_ identifier: FrameIdentifier,
                               stringValue: String?) {
        let frameKey = identifier.frameKey
        if let stringValue = stringValue {
            let frame = StringFrame(identifier,
                                    version: self.version,
                                    stringValue: stringValue)
            self.frames[frameKey] = frame
        } else {
            self.frames[frameKey] = nil
        }
    }

    /// Copyright getter-setter. ID3 Identifier: `TCR`/`TCOP`
    public var copyright: String? {
        get { get(.copyright) }
        set {
            if let new = newValue {
                set(.copyright, stringValue: "\u{00A9} \(new)")
            } else {
                set(.copyright, stringValue: nil)
            }
        }
    }

    /// ProducedNotice getter-setter. ID3 Identifier: `TPRO`.
    public var producedNotice: String? {
        get { get(.producedNotice) }
        set {
            if let new = newValue {
                set(.producedNotice, stringValue: "\u{2117} \(new)")
            } else {
                set(.producedNotice, stringValue: nil)
            }
        }
    }

And this is the initializer and content data code from the frame itself:

    init(_ identifier: FrameIdentifier,
         version: Version,
         stringValue: String) {
        // initialize the contentString property
        self.stringValue = stringValue

        let size: Int
        if identifier.parseAs == .url {
            size = stringValue.encodedASCII.count
        } else {
            size = stringValue.encodedISOLatin1.count + 1 // encoding byte
        }

        // use the default flags
        let flags = version.defaultFlags
        super.init(identifier: identifier,
                   version: version,
                   size: size,
                   flags: flags)
    }

    override var contentData: Data {
        var data = Data()
        if self.identifier.parseAs == .url {
            data.append(stringValue.encodedASCII)
        } else {
            let encoding = String.Encoding.isoLatin1
            data.append(encoding.encodingByte)
            data.append(stringValue.encodedISOLatin1)
        }
        return data
    }

I can't see that I've missed anything, particular since, as I said, the Copyright symbol is working. Which probably means there's something different about the other symbol that I'm not aware of. Is it not compatible with isoLatin1 encoding? I tried the other encoding options permitted by the ID3 spec (utf8, utf16, and utf16BigEndian)

    override var contentData: Data {
        var data = Data()
        if self.identifier.parseAs == .url {
            data.append(stringValue.encodedASCII)
        } else {
            let encoding: String.Encoding
            if self.identifier == .producedNotice {
                encoding = .utf16
            } else {
                encoding = .isoLatin1
            }

            data.append(encoding.encodingByte)
              if self.identifier == .producedNotice {
                data.append(stringValue.encoded(.utf16))
            } else {
                data.append(stringValue.encodedISOLatin1)
            }
        }
        return data
    }

but those didn't work either. It comes close, but the last character is truncated: XCTAssertEqual failed: ("Optional("℗ 2020 Produced Notic")") is not equal to ("Optional("℗ 2020 Produced Notice")") even when attempt to account for the possibility of a different byte count in the frame size:

        if identifier.parseAs == .url {
            size = stringValue.encodedASCII.count
        } else {
            if identifier == .producedNotice {
                size = stringValue.encoded(.utf16).count + 1
            } else {
                size = stringValue.encodedISOLatin1.count + 1 // encoding byte
            }
        }
SDGGiesbrecht commented 3 years ago

Which probably means there's something different about the other symbol that I'm not aware of. Is it not compatible with isoLatin1 encoding?

Correct. Latin 1 does not contain , or thousands of other symbols in Unicode. This may be revealing a deeper bug in your encoding handling that could also be triggered by thousands of other characters.

It comes close, but the last character is truncated: XCTAssertEqual failed: ("Optional("℗ 2020 Produced Notic")") is not equal to ("Optional("℗ 2020 Produced Notice")") even when attempt to account for the possibility of a different byte count in the frame size

Not sure exactly what is going on here, but you might be counting the wrong thing at some point. has 1 scalar and is 3 bytes in UTF‐8 or 2 bytes in UTF‐16.. That is a lot of opportunities for something to be slightly shorter or longer than you expect by accidentally counting in the wrong units.

NCrusher74 commented 3 years ago

Well, I am well and truly befuddled.

I'm going to go ahead and open a pull request on this one, because it's just not making sense to me and I'm going to need to talk it through in greater detail.

NCrusher74 commented 3 years ago

Nevermind. In the process of walking through what I'd been doing to solve it, I finally saw what I was missing.

I was puzzled because other apps were reading the frame properly, so the data was being written fine, but I was parsing it wrong somehow.

Turns out, I was using the extractNullTerminatedString function you wrote back when we were trying to make ID3TagEditor work for our purposes to decode the string's data, which was interpreting the final byte as a null terminator when the encoding was .utf16.

In this particular case, that is actually an easy fix. The end of the string data also happens to be the end of the frame's data, so I could just decode the string without using this particular function. However, in the even that other frames with multiple string properties (thus requiring me to see for a null terminator) should encounter .utf16 encoding, I should probably figure out how to account for the possibility in here.

    mutating func extractNullTerminatedString(_ encoding: String.Encoding) -> String? {
        var remainder = self
        search: while let null = remainder.firstIndex(of: 0) {
            remainder = self[null...].dropFirst()
            if encoding.terminationCount == 2 {
                if remainder.first == 0,
                    self.distance(from: startIndex, to: remainder.startIndex) % 2 != 0 {
                    // Found double‐byte null.
                    remainder = remainder.dropFirst()
                    break search
                } else {
                    // Single only, keep looking.
                    continue search
                }
            } else {
                // Found single‐byte null.
                break search
            }
        }

        var stringBytes: Data.SubSequence
        if remainder.startIndex == self.startIndex {
            // No null found.
            stringBytes = self
            self = self[self.endIndex...]
        } else {
            // Null found.
            stringBytes = self[..<remainder.startIndex]
            self = remainder
            stringBytes = stringBytes.dropLast()
            if encoding.terminationCount == 2 {
                stringBytes = stringBytes.dropLast()
            }
        }
        return String(data: Data(stringBytes), encoding: encoding)
    }
SDGGiesbrecht commented 3 years ago

That method is supposed to handle UTF‐16, although it is possible I made a mistake somewhere. For UTF‐16, when the null character appears, it should land in the branch where the comment says “Found double‐byte null”, and then in the branch at the bottom where it has if encoding.terminationCount == 2.

Try it on some basic data to see what is actually happening:

var data = Data([])
XCTAssertEqual(data.extractNullTerminatedString(.utf16), "")
XCTAssertEqual(data, Data([]))

data = Data([0x00, 0x61, 0x00, 0x00])
XCTAssertEqual(data.extractNullTerminatedString(.utf16), "a")
XCTAssertEqual(data, Data([]))

data = Data([0x00, 0x61, 0x00, 0x62, 0x00, 0x00])
XCTAssertEqual(data.extractNullTerminatedString(.utf16), "ab")
XCTAssertEqual(data, Data([]))