NCrusher74 / SwiftTaggerID3

A Swift library for reading and editing ID3 tags
Apache License 2.0
5 stars 2 forks source link

Restructuring frame handling #2

Closed NCrusher74 closed 4 years ago

NCrusher74 commented 4 years ago

Moving the restructuring to its own branch, since this isn't about parsing at the moment.

SDGGiesbrecht commented 4 years ago

But if I'm not mistaken, what the user will see on autocomplete when entering the languageString for the frame is the actual 3-letter code, right?

What I'd like them to see is either the rawValue, and/or the nativeName. And I'm not sure how to accomplish that. Is it even possible?

Autocomplete will help them find anything that is the name of a case or static var. rawValue is designed so that it can be mostly hidden from Swift. It exists for having a controllable interface that can be interacted with in type‐unsafe languages that need to interact with Swift code from outside it, such as in JSON, a property list, XML, Objective C, etc.

NCrusher74 commented 4 years ago

Okay, looks like I'm trying to micromanage Swift too much again, lol.

So, probably the best thing to do, then, would be to make the isoName a static var rather than a rawValue? Then auto-complete will help users find it?

NCrusher74 commented 4 years ago

Nevermind that last question. I forgot I can't use static var in this situation. But that's okay. I forgot that RawValue is a type alias for String, so I can use public init(language: ISO6392Codes.RawValue).

But then I have a problem on the flip side when parsing.

        let languageCode = contents.stringASCII ?? "und"
            if ISO6392Codes.allCases.contains(languageCode) {
        }

XCode isn't very helpful because it's giving me bad advice here, I think. It correctly informs me that languageCode is a String and of course, what we want is a ISO6392Code. However, it then prompts me to try ISO6392Codes(rawValue: und) which I'm pretty sure is wrong.

If my enum has is:

enum ISO6392Codes: String, CaseIterable {
    case eng = "English"
    case und = "Undetermined"
}

...then und is no longer the rawValue. The rawValue is now undetermined right?

Is there a way to find und in ISO6392Codes.allCases in this situation?

SDGGiesbrecht commented 4 years ago

Are you looking for one of these?

let language: ISO6392Codes = ISO6392Codes(rawValue: contents.stringASCII) ?? .und
let languageCode: String = contents.stringASCII ?? ISO6392Codes.und.code
if ISO6392Codes.allCases.contains(where: { $0.code == languageCode }) {
}
NCrusher74 commented 4 years ago

Yep, that's the one I need, thank you!

Wait, no.

I don't have a code variable in ISO6392Codes. The code is the language case itself, like .und

But I guess I could add that?

NCrusher74 commented 4 years ago

Okay, I'm finally at a point where I have all the errors chased down and can build and thus I can begin writing tests that will guide me when handling the more complex frame types.

However, the new way of handling files and data slice extraction that we've implemented since my initial efforts have broken the tests I wrote at the beginning, particularly one of the TagValidator tests, and I'm having a hard time pinning down where it's gone wrong.

I suspect the problem is with me mishandling how the extraction methods work in this context. Which may have implications for the rest of my code...

This is the test:

    func testValidation() throws {
        //  snip error throwing tests for invalid and corrupted files, which appear to be workin

        // test v2.2
        let v22Validator = TagValidator(for: try Mp3File(location: Bundle.writtenV22))
        XCTAssertTrue(try v22Validator.isValidMp3())
        XCTAssertTrue(try v22Validator.hasValidVersionBytes()) // this one fails...
        XCTAssertTrue(try v22Validator.hasValidTagSize())
        XCTAssertTrue(try v22Validator.hasValidTag()) //... which causes this one to fail

    // same for v2.3 and 2.4 files
    // they all fail and throw the "InvalidTagData" error even though the tag data is valid for the files
     }

In the previous test of TagProperties I ran a print(propertiesV22.extractVersionData(data: v22Data)) command and the result was exactly what it should be. The version data IS valid. So either the test, or the function in testValidator that it's testing, is incorrect.

This is the function it's testing:

    // check that first five bytes are "ID3<version><null>"
    func hasValidVersionBytes() throws -> Bool {
        if try self.isValidMp3() {

            let versionData = tagProperties.extractVersionData(data: self.mp3Data)
            let versionUInt8 = [versionData.uint8]

            let versionBytes = tagProperties.versionBytes
            if versionBytes.contains(versionUInt8) {
                return true
            } else {
                throw Mp3File.Error.InvalidTagData
            }
        }; return false
    }

Looking at it now, maybe the problem is the conversion of the Data to UInt8? This is the extension I use for that:

extension Data {

    var uint8: UInt8 {
        get {
            var number: UInt8 = 0
            self.copyBytes(to:&number, count: MemoryLayout<UInt8>.size)
            return number
        }
    }
}

I don't think the problem is the TagProperties.extractVersionData() method itself, because like I said, the test for that (and a visual check of the data being extracted) are fine. But maybe it's the way I'm handling its return?

NCrusher74 commented 4 years ago

Okay, so, I managed to fix the test by changing this:

    var versionBytes: [[UInt8]] {
        return [v2_2Bytes, v2_3Bytes, v2_4Bytes]
    }

to this:

    var versionData: [Data] {
        let v22Data = Data(bytes: v2_2Bytes, count: 5)
        let v23Data = Data(bytes: v2_3Bytes, count: 5)
        let v24Data = Data(bytes: v2_4Bytes, count: 5)

        return [v22Data, v23Data, v24Data]
    }

And then rewriting the function being tested to this:

    // check that first five bytes are "ID3<version><null>"
    func hasValidVersionData() throws -> Bool {
        if try self.isValidMp3() {

            let versionData = tagProperties.extractVersionData(data: self.mp3Data)
//            let versionUInt8 = [versionData.uint8]

            let knownVersioData = tagProperties.versionData
            if knownVersioData.contains(versionData) {
                return true
            } else {
                throw Mp3File.Error.InvalidTagData
            }
        }; return false
    }

Now, however, the test is failing on the error-handling portions instead.

    func testValidation() throws {
        // test error handling
        let notMp3File = try Mp3File(location: Bundle.notMp3)
        XCTAssertThrowsError(try TagValidator(for: notMp3File).isValidMp3())

        let mp3Corrupted = try Mp3File(location: Bundle.corruptedV23)
        let validatorCorrupted = TagValidator(for: mp3Corrupted)
        XCTAssertTrue(try validatorCorrupted.isValidMp3())
        XCTAssertTrue(try validatorCorrupted.hasValidTagSize())
        XCTAssertThrowsError(try validatorCorrupted.hasValidVersionData()) // this should be throwing and it's not
        XCTAssertThrowsError(try validatorCorrupted.hasValidTag()) // so should this
 // snip
    }

headdesk

SDGGiesbrecht commented 4 years ago

Looking at it now, maybe the problem is the conversion of the Data to UInt8?

That extension only puts the first byte into the UInt8 and returns it. The entire thing is no different than data.first ?? 0x00. If you are trying to convert back and forth between Data and [UInt8], just do Data(byteArray) or [UInt8](data).

The other snip‐its of code don’t make much sense to me out of context. Try testing each little piece on its own before trying to load an entire file. i.e. If we have just the version bytes x, y, and z (or whatever proper number of them), and we call extract, etc. to get the version, do we end up with the version we expect? If all of that appears to be working, move up to the frame level, and then after that to the tag level, before finally trying it on a whole file. Through all of that we are only concerned with reading. Once we get to the file level and try to load something created by another application, we might discover we misunderstood some things. If that is the case, we go back and fix those before we switch gears to try testing the writing side (working from the bottom up again).

NCrusher74 commented 4 years ago

Sorry, I'll try to give more context.

Right now, I haven't even gotten to the point of parsing anything at the frame level. This is just reworking tests on the incoming file to make sure it's valid and has valid tag data before beginning to parse it.

So, I've got my Mp3File type, which has a function to read() the file:

    public func read() throws {
        let tag = Tag()
        try tag.parseFramesFromTag(file: self)
    }

which calls this:

    func parseFramesFromTag(file: Mp3File) throws {
        let fileData: Data = file.data

        var remainder: Data.SubSequence = fileData[fileData.startIndex..<fileData.endIndex]

        let tagProperties = TagProperties(for: file)
        let tagValidator = TagValidator(for: file)

        // parse version from tag header
        var version: Version = .v2_4
        if try tagValidator.hasValidTag() != true {
            throw Mp3File.Error.InvalidTagData
        } else {
            // the first five bytes of a valid ID3 Tag are "ID3"+ the version number in UInt8
            let versionData = tagProperties.extractVersionData(data: fileData)
            version = try tagProperties.version(data: versionData)

            // parse flags from tag header
            _ = tagProperties.extractFlagData(data: fileData)

            // parse size from tag header
            let tagSizeData = tagProperties.extractTagSizeData(data: fileData)
            _ = tagProperties.size(tagSizeData: tagSizeData)
        }
    // ... and then we start parsing the frame data out
}

So so far, we haven't touched the frames. We're simply checking the incoming file to make sure we can work with it.

So the first thing we do is run it through a series of pretty simple checks in TagValidator to make sure the file itself can be worked with:

    // Check that mp3 has a valid file extension
    // test that

    // Check if MP3 is too small for a tag
    // test that too

    // confirm valid MP3 or throw error
    func isValidMp3() throws -> Bool {
        if self.hasValidExtension {
            if self.isValidSize {
                return true
            } else {
                throw Mp3File.Error.FileTooSmall
            }
        } else {
            throw Mp3File.Error.InvalidFileFormat
        }
    }

then we check if the tag data can be worked with:

    // check that first five bytes are "ID3<version><null>"
    func hasValidVersionData() throws -> Bool {
        if try self.isValidMp3() {

            let versionData = tagProperties.extractVersionData(data: self.mp3Data)
//            let versionUInt8 = [versionData.uint8]

            let knownVersionData = tagProperties.versionData
            if knownVersionData.contains(versionData) {
                return true
            } else {
                throw Mp3File.Error.InvalidTagData
            }
        }; return false
    }

    // check that tag size does not exceed file size
    func hasValidTagSize() throws -> Bool {
        let byteOffset = tagProperties.tagSizeDeclarationOffset
        let endOfRelevantBytes = byteOffset + tagProperties.tagSizeDeclarationLength
        let tagSizeDataRange = byteOffset ..< endOfRelevantBytes
        let tagSizeData = mp3Data.subdata(in: tagSizeDataRange)

        let sizeInt = Int(tagProperties.size(tagSizeData: tagSizeData))
        let headerSize = tagProperties.tagHeaderLength
        let tagSize =  sizeInt + headerSize

        if mp3Data.count < tagSize {
            throw Mp3File.Error.CorruptedFile
        }; return true
    }

    // confirm valid tag tag data
    func hasValidTag() throws -> Bool {
        if try self.hasValidVersionData() && self.hasValidTagSize() {
            return true
        } else {
            throw Mp3File.Error.InvalidTagData
        }
    }

And all these tests sort of daisy-chain together so all parseFramesFromTag(file: Mp3File) has to do is call hasValidTag and it will run the whole series (because hasValidVersionData calls isValidMP3 at the beginning.)

After testing to make sure the file is valid, parseFramesFromTag(file: Mp3File) starts plucking out the header data and getting any information we're going to need from it, which happens in TagProperties:

            let versionData = tagProperties.extractVersionData(data: fileData)
            version = try tagProperties.version(data: versionData)

            // parse flags from tag header
            _ = tagProperties.extractFlagData(data: fileData)

            // parse size from tag header
            let tagSizeData = tagProperties.extractTagSizeData(data: fileData)
            _ = tagProperties.size(tagSizeData: tagSizeData)

So this is as far as I've gotten. I haven't even gotten to the frame handling stuff yet.

It should be noted, TagProperties.version(data: versionData) is working. When I run a test on it, it passes. But when I run a test on a function that uses the same extraction method that provides the Data (TagProperties.extractVersionData(data: Data)) it fails.

So the problem is somewhere either in TagValidator.hasValidVersionData() (which I snippeted above) or in the way I'm getting the data into that function.

When I converted the extracted version Data to UInt8 and compared it to the [UInt8] of the known version bytes, and compared the two, the tests on valid files failed and threw errors.

When I converted the [UInt8] of the known versions bytes to Data (using Data(byteArray)) and compared it to the Data of the extracted version, the tests on the valid files passed, but the test making sure the invalid files would throw errors failed.

And...I can't perform a simpler test. Like, this is as simple as it gets. I'm just slicing off the first 3-4 bytes of the file and comparing them to a known value and making sure if they don't match, I get an error.

The closest I can come to simplifying it any further is making sure the method extracting the version data is behaving properly. print(TagProperties.extractVersionData(data:Data)) That worked. The values printed out were exactly what they should be.

TagProperties.extractVersionData(data:Data) is used in both TagProperties.version(data: Data) throws -> Version (which passes when tested) and in hasValidVersionData() throws -> Bool, which fails when tested, but only when testing an invalid file (comparing extractedData to knownData) or only when testing a valid file (comparing extractedBytes to knownBytes.)

Which is...very odd.

It should be noted that I'm using the "corrupted" file from ID3TagEditor and it could be that the file is only "corrupted" when using upstream's bytes-to-bytes method of checking?

Hmmm... does converting the bytes to Data come with any sort of...error correction? Could that be why I'm not getting any errors thown when testing it using Data to Data comparison?

SDGGiesbrecht commented 4 years ago

So when you print the result of each line, which is the first one to not match your expectation? (Don’t actually print all of them at once. Start in the middle. If it looks right, try later on, if it looks wrong, try earlier. If you keep adding the new print statement halfway between the last one that was right and the first one that was wrong, you will narrow in closer to the culprit with each try.)

NCrusher74 commented 4 years ago

That's easy and basically the first thing I did. Sorry if I wasn't clear on that. There's not much earlier or later to work with, since this is pretty much the very first step in handling things, but what I just spent all those words on was that I can pinpoint exactly where things are not behaving as they should and the fact that they aren't is very bizarre.

        XCTAssertTrue(try validatorCorrupted.isValidMp3())
//        print(try validatorCorrupted.isValidMp3()) // true, behaves as expected
        XCTAssertTrue(try validatorCorrupted.hasValidTagSize())
//        print(try validatorCorrupted.hasValidTagSize()) // true, behaves as expected
        XCTAssertThrowsError(try validatorCorrupted.hasValidVersionData())
//        print(try validatorCorrupted.hasValidVersionData()) // true, but should be FALSE
        let propertiesCorrupted = TagProperties(for: mp3Corrupted)
        let dataCorrupted = mp3Corrupted.data
        print(propertiesCorrupted.extractVersionData(data: dataCorrupted).hexadecimal())
        // 49 44 33 3 0, which is exactly what it should be...for a **valid** file, but this is an invalid file

I think maybe I'm on to something with the last two paragraphs of that long post. I suspect my "corrupted" file is only "corrupted" if using the [UInt8] to [UInt8] comparison method from ID3TagEditor. Either the process of handling it as Data corrects whatever was supposedly corrupted, or... I don't know. I wouldn't even know how to make a deliberately "corrupted" mp3, except for maybe intentionally writing some bad bytes to it? Which may very well be what upstream did.

If so, then the problem isn't my code or the test, but the file, and since I have ample proof that my error-throwing is working, I suppose I can just omit that test?

NCrusher74 commented 4 years ago

Hm. Maybe not. I just tried bytes-to-bytes comparison using the correct method of data-to-UInt8 conversion (as opposed to what I was doing initially) and I'm getting the same result. The test passes with the valid files but errors aren't being thrown where they should be with the invalid file.

And when I tried to open the invalid file with Yate, it says that the tag size is invalid, so maybe the problem begins the hasValidTagSize() test instead of hasValidVersionData().

SDGGiesbrecht commented 4 years ago

XCTAssertThrowsError(try validatorCorrupted.hasValidVersionData())

If that is the line you’ve narrowed it to, then keep narrowing it by going inside hasValidVersionData() and doing the same print strategy. Once you’ve narrowed that to a particular method, go inside that one. Keep zooming in until the cause becomes apparent.

Since I haven’t been able to spot anything obviously wrong, that is really the only strategy left for either of us to try. I’m recommending you be the one to do it, because you are more familiar than I am with what it is supposed to be doing.

NCrusher74 commented 4 years ago

Okay, I think I see what is happening now.

My initial assumption that the problem began with the test of the version data was incorrect. I zoomed in on that one because it was the one that was failing. Turns out, there shouldn't actually be an error thrown there, but the next test, the one for the tag size, should throw. It wasn't because I modeled it after similar checks in ID3TagEditor and those checks weren't checking for this particular problem with a file.

    // check that tag size does not exceed file size
    func hasValidTagSize() throws -> Bool {
        let tagSizeData = tagProperties.extractTagSizeData(data: self.mp3Data)
//         print(tagSizeData) - 1 bytes (this is a correct tag size byte)
//        print(tagSizeData.uint8) - 73 - I don't know what this means
        let sizeInt = Int(tagProperties.size(tagSizeData: tagSizeData))
//         print(sizeInt) - 0 - I assume this should be greater than 0? 0 would mean no content in the tag.
        let headerSize = tagProperties.tagHeaderLength
//         print(headerSize) - 10 (this is the correct size for a tag header)
        let tagSize =  sizeInt + headerSize
//        print(tagSize) - 10 - probably should be larger
//        print(mp3Data.count) - 236
        if mp3Data.count < tagSize {
            throw Mp3File.Error.CorruptedFile
        }; return true
    }

The file itself is only 236 bytes. Which is large enough to pass the checks making sure the file is large enough to hold a tag, and this particular check here making sure the tag size doesn't exceed the file size.

But the tag itself is just a 10-byte header. There's no other data, no content. And I wasn't checking for the size of the content.

NCrusher74 commented 4 years ago

Though I think also I might have an issue with this line:

let sizeInt = Int(tagProperties.size(tagSizeData: tagSizeData))

which, again, is modeled after a similar check in ID3TagEditor but now that I'm changed the test to make sure a valid tag is larger than 10bytes, all the files are failing. Which says to me that something is off with how I'm calculating things.

    /// the size of the ID3 tag
    func size(tagSizeData: Data) -> UInt32 {
        let tagSizeNSData = tagSizeData as NSData
        let tagDataBytes = tagSizeNSData.bytes + tagSizeDeclarationOffset
        let tagSize = tagDataBytes.assumingMemoryBound(
            to: UInt32.self).pointee.bigEndian
        let decodedTagSize = tagSize.decodingSynchsafe()
        return decodedTagSize
    }
NCrusher74 commented 4 years ago

I think I need some help interpreting these numbers. I've tried using https://onlinehextools.com/ but I don't know precisely which calculator I'm supposed to be using.

First, I want to make sure I'm using the extractFirst chaining correctly.

    func extractTagSizeData(data: Data) -> Data {
        var tagSizeData = data.dropFirst(versionDeclarationLength + tagFlagsLength)
        return tagSizeData.extractFirst(tagSizeDeclarationLength)
    }

The tag size data is after the version declaration data and the tag flags data, so this is the way I'd go about isolating the proper data, right? Or do I need to be converting things to [UInt8] for this to work properly?

From there, I'm using a variation of the func declaredSize(file: Data, frameStart: Data.Index, version: Version ) -> Int that you recommended to me a couple weeks ago to convert the tag size data to an integer. But since this is the TAG size data (rather than frame size) I'm not sure that's the proper way to go about it.

    /// the size of the ID3 tag
    func size(data: Data) throws -> Int {
        let tagSizeData = extractTagSizeData(data: data)
//        print(tagSizeData.hexadecimal()) // 0 0 18 3e - I don't know how to interpret this
        let raw = UInt32(parsing: tagSizeData, .bigEndian)
//        print(raw) - 0 - this probably isn't what it should be?
        switch try version(data: tagSizeData) {
            case .v2_2, .v2_3:
                return Int(raw)
            case .v2_4:
                return Int(raw.decodingSynchsafe())
        }
    }

And then I run that through my validation check:

    // check that tag size does not exceed file size
    func hasValidTagSize() throws -> Bool {
        let tagSizeData = tagProperties.extractTagSizeData(data: self.mp3Data)
        //        print(tagSizeData.hexadecimal()) // 0 0 18 3e 
        let sizeInt = try tagProperties.size(data: tagSizeData)
        //        print(sizeInt) - this doesn't print at all
        let headerSize = tagProperties.tagHeaderLength
        let tagSize =  sizeInt + headerSize
        if tagSize <= headerSize {
            throw Mp3File.Error.TagTooSmall
        } else if tagSize > mp3Data.count {
            throw Mp3File.Error.TagTooBig
        }   else {
            return true
        }
    }

So I guess my first task here is figuring out what 0 0 18 3e means and whether it's somewhere in the area of where it should be.

Thanks to a handy little tool in Yate, I know what the size of the tag for my files actually is:

File size: 159095
Format: MPEG-1, Layer 3
Channel mode: Joint stereo
Sample rate: 48000 Hz
Duration: 9767 ms
Bit rate: 128 kbits/sec
ID3 Tag version 2.2.0
Tag at offset: 0 size = 3144
Padding: 2048 bytes
Audio base: 3144 size = 155951

(for the v22 file)

So I have something to compare against. I just need to get a value to compare. And to do that, I need to make sure the value I'm using is being derive the proper way.

SDGGiesbrecht commented 4 years ago

// 0 0 18 3e - I don't know how to interpret this

Use hexadecimal → decimal, adding any missing leading zeroes first, and removing spaces if the converter doesn’t ignore them on its own.

00 00 18 3E converted to decimal is:

But if it’s version 2.4, that hasn’t had the synchsafe decoded yet.

print(raw) - 0 - this probably isn't what it should be?

Yup. Something wrong. I’ll take a look.

let tagSize =  sizeInt + headerSize
if tagSize <= headerSize {

That doesn’t check anything. tagSize will necessarily be at least as large as headerSize unless tagSize is negative, which is unrepresentable for an unsigned integer.

NCrusher74 commented 4 years ago

But if it’s version 2.4, that hasn’t had the synchsafe decoded yet.

I ran the test on all three of my valid files, and they are all coming back with an invalid tag size, regardless of synchsafe.

That doesn’t check anything. tagSize will necessarily be at least as large as headerSize unless tagSize is negative, which is unrepresentable for an unsigned integer.

Yeah that one was from late last night, when for some reason I was getting a tag size that seemed to be only the size of the header (turns out I was using the extract method wrong) and I figured that was an error I needed to catch, if the tag has no usable data other than the header? idk. I'm not sure what I was thinking last night.

SDGGiesbrecht commented 4 years ago

print(raw) - 0 - this probably isn't what it should be?

Actually that prints 6206 for me, as expected:

let tagSizeData: [UInt8] = [0x00, 0x00, 0x18, 0x3E]
let raw = UInt32(parsing: tagSizeData, .bigEndian)
print(raw)

Tag at offset: 0 size = 3144

Is that the number you’re looking for? And without synchsafe? In big‐endian hexadecimal that is 0C 48, so those are the two bytes you are looking for (with whatever number of leading 00 bytes). You could run a search for their real position to figure out if you are just expecting them to be in the wrong spot. Either print the whole thing, do ⌘F, and look at what’s around it, or use one of the collection searching API’s to find the real index.

NCrusher74 commented 4 years ago

... weird. It's printing 6206 for me now, too. It wasn't earlier this morning through.

Is that the number you’re looking for? And without synchsafe?

I... believe so?

In big‐endian hexadecimal that is 0C 48, so those are the two bytes you are looking for (with whatever number of leading 00 bytes).

I have that information now for all three files (all of which except for v22 were written by known compliant tagging apps; the v22 test file was was written by ID3TagEditor because I can't find an app that writes v22.)

The same info for v23 is: Tag at offset: 0 size = 3909 and for v24: Tag at offset: 0 size = 2752 (this one is with synchsafe)

But I'm not sure what finding the particular bytes in the file is going to accomplish. (nevermind, you told me what it would accomplish, sorry)

What I'm testing right now are three functions:

TagProperties.extractTagSizeData(data: Data) TagProperties.size(data: Data) (where data: is extractTagSizeData) TagValidator.hasValidTagSize (which also gets the data it's testing from extractTagSizeData)

One of these isn't working. I suspect it's TagProperties.size(data: Data). I'm just trying to narrow down how it isn't working.

SDGGiesbrecht commented 4 years ago

But I'm not sure what finding the particular bytes in the file is going to accomplish.

It will tell you if you are even pointing extractTagSizeData at the right place. If you compare the index where extractTagSizeData is looking against the index where the expected number actually occurs, you might find out there is more (or less) data in there before the size declaration than you thought. If so, fixing extractTagSizeData might really be a matter of fixing tagFlagsLength instead or something.

NCrusher74 commented 4 years ago

It looks like maybe that number from Yate maybe isn't coming from the tag size declaration itself, maybe Yate calculates it independently, because I can't find that number in there using a dump from print(mp3v22File.data.hexadecimal()). Hrm.

NCrusher74 commented 4 years ago

This is weird... Looking at the failing test, it's failing because it's throwing an "InvalidTagData" error.

Screen Shot 2020-04-21 at 4 23 56 PM

That error message is only printed from the InvalidTagData error (I double-checked to make sure.)

But I can't find anyplace in the process flow where that error would be thrown.

XCTAssertTrue(try v22Validator.hasValidTagSize()) is testing hasValidTagSize, which doesn't throw that error.

    // check that tag size does not exceed file size
    func hasValidTagSize() throws -> Bool {
        let tagSizeData = tagProperties.extractTagSizeData(data: self.mp3Data)
//        print(tagSizeData.hexadecimal()) // 0 0 18 3e
        let sizeInt = try tagProperties.size(data: tagSizeData)
//        print(sizeInt)
        let headerSize = tagProperties.tagHeaderLength
        let tagSize =  sizeInt + headerSize
        if tagSize <= headerSize {
            throw Mp3File.Error.TagTooSmall
        } else if tagSize > mp3Data.count {
            throw Mp3File.Error.TagTooBig
        }   else {
            return true
        }
    }

hasValidTagSize gets the data it's checking using tagProperties.extractTagSizeData(data: self.mp3Data), which doesn't throw that error:

    func extractTagSizeData(data: Data) -> Data {
        var tagSizeData = data.dropFirst(versionDeclarationLength + tagFlagsLength)
        return tagSizeData.extractFirst(tagSizeDeclarationLength)
    }

hasValidTagSize gets the sizeInt using try tagProperties.size(data: tagSizeData), which doesn't throw that error:

    /// the size of the ID3 tag
    func size(data: Data) throws -> Int {
        let tagSizeData = extractTagSizeData(data: data)
//        print(tagSizeData.hexadecimal()) // 0 0 18 3e
        let raw = UInt32(parsing: tagSizeData, .bigEndian)
//        print(raw) - 6206
        switch try version(data: tagSizeData) {
            case .v2_2, .v2_3:
                return Int(raw)

            case .v2_4:
                return Int(raw.decodingSynchsafe())
        }
    }
}

It gets the headerSize variable from tagProperties.tagHeaderLength, which doesn't throw any errors at all.

    var tagHeaderLength: Int {
        return 10
    }

tagProperties.extractTagSizeData(data: self.mp3Data) uses a few computed properties, none of which throw an error:

    /// the byte-count of the ID3 version declaration
    var versionDeclarationLength: Int {
        return 5
    }

    /// the byte-count of the tag's UInt32 flags
    var tagFlagsLength: Int {
        return 1
    }

    /// the byte-count of the tag's UInt32 size declaration
    var tagSizeDeclarationLength: Int {
        return 4
    }

Beyond that, the data being passed is coming directly from the Mp3File (by way of mp3File.data), which doesn't throw that error anywhere in the initializer, either.

    public init(location: URL) throws {
        self.location = location
        do {
            self.data = try Data(contentsOf: location)
        } catch {
            throw Mp3File.Error.CannotReadFile
        }
    }

That error absolutely should not be appearing anywhere in this particular process, yet it is. I've traced back every step of the process four times, nothing between the MP3File and the test throwing the error touches anything that throws that particular error. I've cleaned by build folder, wiped my DerivedData folder, everything I can think of.

I'm so confused.

SDGGiesbrecht commented 4 years ago

I’ll pull it and try myself.

SDGGiesbrecht commented 4 years ago

Activating this breakpoint revealed where the error is originating:

Screen Shot 2020-04-21 at 19 43 45

Screen Shot 2020-04-21 at 19 44 51

It looks to me like in hasValidTagSize() you’re slicing out the tiny section of data that contains the size (00 02 0B 33), then passing that tiny section to size(data:) which attempts to pull that same slice from within itself as if it were the entire tag. That is beyond the end and results in empty data. You then pass that empty slice of a slice to version(data:) (instead of the entire tag). That method doesn’t know what to do with what it believes to be an empty tag, and so it throws the invalid tag error.

NCrusher74 commented 4 years ago

Ahhh okay. I thought I was going crazy when I realized that error was being thrown in a process that it shouldn't have been involved in.

That was silly of me to re-slice the slice. I think I did that early on when the test wasn't working, thinking that size(data:) wasn't getting the same data slice of data for running hasValidTagSize as it was when it was being called elsewhere.

Thank you.

NCrusher74 commented 4 years ago

Just to see how this would all work with a version 2.2 file, I went ahead and tried it out on that one too.

According to the documentation, the header for the frame should be a 3-byte frame declaration (which Yate incorrectly lists as the 4-byte version 2.3-2.4 equivalent in the final column) followed by a 3-byte size declaration (which I believe you said would be UInt24? with some sort of masking used?)

00000010 00000031 00000021 (0000T1) TALB (Album)
// 54 41 4c (TAL) - 0 0 f (15, which tracks for 21 minus 6) - 1 - ff fe - 41 0 6c 0 62 0 75 0 6d 0 0 0 (Album with 0s?)
00000031 00000064 00000033 (0000T1) TPE2 (AlbumArtist)
// 54 50 32 (TP2) - 0 0 1b (27, or 33-6) - 1 - ff fe - 41 0 6c 0 62 0 75 0 6d 0 41 0 72 0 74 0 69 0 73 0 74 0 0 0 (AlbumArtist with 0s?)
00000064 00000087 00000023 (0000T1) TPE1 (Artist)
// 54 50 31 (TP1) - 0 0 11 (17) - 1 - ff fe - 41 0 72 0 74 0 69 0 73 0 74 0 0 0 (Artist with 0s?)
00000087 00000108 00000021 (0000T1) TIT2 (Title)
// 54 54 32 (TT2)- 0 0 f (15) - 1 - ff fe - 54 0 69 0 74 0 6c 0 65 0 0 0 (Title with 0s?)

Now, I don't know what a lot of what Yate is picking up in those frames is, because they shouldn't be that size. I don't think. I assume the 7th byte, which is always 1, is the encoding byte? But it looks like there's a 0 after every character of the content. I suspect if I took the 0s out, the last parts would be the content it's supposed to be. Not sure about the ff fe thing though.

But then, Yate doesn't actually deal with version 2.2. It will read it, but it doesn't write it. This file was written by ID3TagEditor If the encoding is in fact utf16WithBOM (which is what a 1 for the encoding byte would indicate) then I guess that explains the zeros?

NCrusher74 commented 4 years ago

So, here's my (I'm sure) very error-prone workaround for getting an accurate frame size:

        // parse content size second
        let frameSizeData = data.extractFirst(version.sizeDeclarationLength)

        var frameSize: Int = 0
        let sizeUInt8 = [UInt8](frameSizeData)
        let byteOfInterest = sizeUInt8[1]
        switch version {
            case .v2_2, .v2_3: frameSize = Int(byteOfInterest)
            case .v2_4: frameSize = Int(byteOfInterest.decodingSynchsafe())
        }

        // parse content last
        let contentDataStart = data.startIndex + version.frameHeaderLength
        let contentDataRange = contentDataStart ..< contentDataStart + frameSize
        let contentData = data.subdata(in: contentDataRange)

It results in an accurate contentData (6 bytes.) But I'm sure it's open to some pretty sticky errors.

Unfortunately, I'm still starting my parsing of the contentData 30 bytes in, when it should be 20, and I can't figure out where those other 10 bytes are happening.

SDGGiesbrecht commented 4 years ago

But I don't know how they calculate that. But 0 0 1e 3b should probably be "3909" (or maybe "1558", which is 3909 minus the padding?)

The (v2.3) specification says this:

It is permitted to include padding after all the final frame (at the end of the ID3 tag), making the size of all the frames together smaller than the size given in the head of the tag. A possible purpose of this padding is to allow for adding a few additional frames or enlarge existing frames within the tag without having to rewrite the entire file. The value of the padding bytes must be $00.

So the declared size (tag) is equal to the used space (frames) plus the empty space (padding). So depending on what Yate mean’s by that, the declared size is either equal to Yate’s “size”, or to Yate’s size plus Yate’s “padding”.


So, I think we need to take as given that Yate knows what it's talking about, because so far everything is checking out as being exactly what Yate says it is and exactly where Yate says it should be.

I concur.


I assume the 7th byte, which is always 1, is the encoding byte? But it looks like there's a 0 after every character of the content. I suspect if I took the 0s out, the last parts would be the content it's supposed to be. Not sure about the ff fe thing though.

The FF FE is a byte order mark, and in UTF‐16, every two bytes are a character. The apparent intervening zeroes are because all the characters are in the ASCII range, where one half is simply zero. It’s conceptually similar to writing “9” as a two‐digit number to fit in better with a list of other two‐digit numbers: “26, 84, 09, 56, 13”.


I still don't know how this works with the tag size, however. The size bytes were 0 0 1e 3b. Calculator tells me that 1e is 30, but that 3b is 59. How that translates to either 3909 or 1558 I have no clue. 30*59 is 1770 so it's not a matter of math. The converter I normally use tells me nothing useful.

We still don’t know how it works for any of them, but from what you’ve said, these conversions must hold true:

You remember how in normal notation, the digits are arrayed in a ones’ place, tens’ place, hundreds’ place and so on? 789 = 7 × 100 + 8 × 10 + 9 × 1. Or better, 789 = 7 × 102 + 8 × 101 + 9 × 100. And in hexadecimal all those 10s are replaced by sixteens: 789 = 7 × 162 + 8 × 161 + 9 × 160, or 789 = 7 × 256 + 8 × 16 + 9 × 1. Byte sequences (barring synchsafe) also work the same way, just base 256.

Plainly for the frames, the second byte is the ones’ place. To easily figure out where the 256s’ place is, simply add a string tag longer than that to Yate, and read the bytes. It’s fishy to me though that the ones’ byte doesn’t appear to be at either end. Are all four bytes really part of the size?

I now realize the tag size may not declared the same way as the frame sizes. The v2.3 specification says this:

The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is represented as $00 00 02 01.

That describes something like synchsafe.

Let’s start with getting the frame ones right. (If you have to while you are experimenting, just temporarily lie to the tag and tell it that it extends clear to the end of the file.)

NCrusher74 commented 4 years ago

So the declared size (tag) is equal to the used space (frames) plus the empty space (padding). So depending on what Yate mean’s by that, the declared size is either equal to Yate’s “size”, or to Yate’s size plus Yate’s “padding”.

Well, Yate's stated tag size of 3909 includes the padding. The size without the padding is 1558. We know that because we know the end index of the final frame in the frames list is 1558.

1558 plus Yate's stated padding size (2351) is 3909.

If Yate is compliant with the spec--and we have no reason to believe it isn't, and we both agree that Yate's raw data is an accurate meter-stick for comparison--then the declared size will be 3909.

So we just have to figure out how to get 3909 from 0 0 1e 3b.

The FF FE is a byte order mark, and in UTF‐16, every two bytes are a character. The apparent intervening zeroes are because all the characters are in the ASCII range, where one half is simply zero. It’s conceptually similar to writing “9” as a two‐digit number to fit in better with a list of other two‐digit numbers: “26, 84, 09, 56, 13”.

Okay. Good to know, since I'm sure I'll encounter that again.

We still don’t know how it works for any of them, but from what you’ve said, these conversions must hold true:

* `00 06 00 00` → 6
* `00 07 00 00` → 7
* `00 0C 00 00` → 12
* `00 22 00 00` → 34
* `00 1E 00 00` → 30
* `00 1F 00 00` → 31
* `00 00 1E 3B` → 3909 (If Yate’s size includes padding) or 6260 (if Yate’s size excludes it)

As calculated above, Yate's size includes it.

Plainly for the frames, the second byte is the ones’ place. To easily figure out where the 256s’ place is, simply add a string tag longer than that to Yate, and read the bytes. It’s fishy to me though that the ones’ byte doesn’t appear to be at either end. Are all four bytes really part of the size?

Taking this paragraph in reverse, order, according to the spec:

(v2.2 for tag:) ID3/file identifier "ID3" ID3 version $02 00 ID3 flags %xx000000 ID3 size 4 * %0xxxxxxx (snip) The ID3 tag size is encoded with four bytes where the first bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is represented as $00 00 02 01.

(v2.2 for frame) The three character frame identifier is followed by a three byte size descriptor, making a total header size of six bytes in every frame.

(v2.3 for tag) ID3v2/file identifier "ID3" ID3v2 version $03 00 ID3v2 flags %abc00000 ID3v2 size 4 * %0xxxxxxx (snip) The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is represented as $00 00 02 01.

(v2.3 for frame) Frame ID $xx xx xx xx (four characters) Size $xx xx xx xx Flags $xx xx (snip) The frame ID is followed by a size descriptor, making a total header size of ten bytes in every frame.

(v2.4 for tag) ID3v2/file identifier "ID3" ID3v2 version $04 00 ID3v2 flags %abcd0000 ID3v2 size 4 * %0xxxxxxx The ID3v2 tag size is stored as a 32 bit synchsafe integer (section 6.2), making a total of 28 effective bits (representing up to 256MB).

(v2.4 for frame) Frame ID $xx xx xx xx (four characters) Size 4 * %0xxxxxxx Flags $xx xx The frame ID is followed by a size descriptor containing the size of the data in the final frame, after encryption, compression and unsynchronisation. The size is excluding the frame header ('total frame size' - 10 bytes) and stored as a 32 bit synchsafe integer.

So, with the exception of v2.2, where the frame size declaration is three bytes, then all size declarations on both the tag and frames level should be four bytes.

As to checking on how things work with a frame larger than 256bytes, I went ahead and put the lyrics to "American Pie" in the lyrics frame. Here's what Yate's raw data says about it now:

00001221 00005689 00004468 (0000T0) USLT LyricsTest

    func testPrint() throws {
        let longLyricsFile = Bundle.longLyricsFile
        let longLyricsMp3File = try Mp3File(location: longLyricsFile)
        let longLyricsData = longLyricsMp3File.data
        let frameSizeRange = 1221..<5689
        print(longLyricsData.subdata(in: frameSizeRange).hexadecimal())
    }

55 53 4c 54 - 0 0 - 22 6a 0 0 is the first 10 bytes, or the frame header. The last four of those, 22 6a 0 0 are the size declaration (34, 106, 0, 0)?. Which, since (according to Yate's raw data) the frame is 4468 bytes including the 10 byte header, should be 4468-10 = 4458.

Does that help?

NCrusher74 commented 4 years ago

As for the tag size declaration...while I'm sure at some point we need to make sure we're calculating it accurately for writing, as far as I can tell, we don't actually use it when reading, at least now that I've removed all the TagValidator functions.

NCrusher74 commented 4 years ago

Do you know of a way to adjust the font size of the debugger panel?

I'm having issues with my eyesight that have been getting progressively worse for the past few months while I've been waiting until April when my insurance would cover my annual eye exam. I've pretty much had to crank up the font of everything I look at on to at least 133%, and even then things seem to blur out on me pretty easily. But I can't find a setting for the debugger panel and it's making it difficult for me to pick through what I'm seeing. But now because of the quarantining stuff, I probably won't be able to get to the optometrist for at least another month.

Screen Shot 2020-04-25 at 10 25 15 PM

This is what I'm seeing after trying to run the Mp3File.read() function after implementing my frame size reader code. As you can see, the frame identifier is gibberish, and little wonder, since the slice of file it's trying to read is at the very end of the file, where there shouldn't be any tag content.

Screen Shot 2020-04-25 at 10 26 12 PM

If this is the first frame it's trying to read, it should be getting the frame identifier in range 10..<14, and the contentData should be 12 bytes in range 14..<26. If this is the second frame it's trying to read (which it might be, since it looks like there's already one [FrameKey:Frame] pair stored, even if it's invalid) then the identifier should be in range 26..<30 and the contentData should be 13 bytes in range 30..<43.

I have to assume this is related to my first-stab effort at getting a sensible read on the frame size, but I don't see how and the fact that the text on the screen keeps swimming isn't helping me figure it out.

NCrusher74 commented 4 years ago

Okay, so it looks like what is happening with the thing above is that the first frame identifier is being parsed out okay, but everything after that is gibberish, either because my frame size calculation is garbage or because more bytes are getting extracted than should be.

Because this:

        while !remainder.isEmpty {
            let identifierBytes = remainder.extractFirst(version.identifierLength)
            print(remainder.count)
            let identifier = try String(ascii: identifierBytes)
            print(identifier)
            let frame = try Frame(
                identifier: identifier,
                data: &remainder,
                version: version)

Is getting me this:

Screen Shot 2020-04-26 at 11 48 04 AM

all the way to the end of the file:

Screen Shot 2020-04-26 at 11 48 19 AM

We probably need some sort of mechanism to catch invalid identifiers, but I don't know how we'd do that without running the risk of squashing someone's custom frames. Maybe it will all work out once we've got the frame/tag size stuff worked out.

NCrusher74 commented 4 years ago

Okay, actually the problem was I was dropping a little too much at the end of handling the frame. With that fixed, it's working, except that it doesn't know when to stop. Which is where knowing the tag size comes in, I guess?

SDGGiesbrecht commented 4 years ago

Do you know of a way to adjust the font size of the debugger panel?

No, but Google seems to: https://www.google.com/search?q=xcode+debugger+panel+font+size

55 53 4c 54 - 0 0 - 22 6a 0 0 is the first 10 bytes, or the frame header. The last four of those, 22 6a 0 0 are the size declaration (34, 106, 0, 0)?. Which, since (according to Yate's raw data) the frame is 4468 bytes including the 10 byte header, should be 4468-10 = 4458.

Does that help?

22 6A 00 00 would equal 4458 if it’s actually mid‐little endian (really?!) and synchsafe. That means:

Byte index Place value
1 × 2562
2 × 2561
3 × 2564
4 × 2563

So I guess move the first half to the end before interpreting them as big endian.

And it also means I’m not sure that other pointer conversion method would work properly with exorbitantly large frames or on other‐endian systems.

NCrusher74 commented 4 years ago

Okay, I feel stupid. I could have sworn I'd googled that at one point and came up empty, but maybe I didn't. Sorry.

Edit: Actually, I had already adjusted the console font. I wanted to adjust the font on the left-hand panel beside the console:

Screen Shot 2020-04-26 at 4 03 46 PM

That's the one I can't find a setting for.

Your "(really?!)" has me chuckling.

NCrusher74 commented 4 years ago

Okay, I'll see your "really?!" and raise you a "SERIOUSLY?!"

So, this is crude, but I reordered the bytes as you recommended:

        // parse content size second
        let frameSizeDataUnordered = [UInt8](data.extractFirst(version.sizeDeclarationLength))
        let frameSizeOrdered = [
            frameSizeDataUnordered[2],
            frameSizeDataUnordered[3],
            frameSizeDataUnordered[0],
            frameSizeDataUnordered[1]
        ]
        let frameSizeData = Data(frameSizeOrdered)
        var frameSize: Int = 0
        let raw = UInt32(parsing: frameSizeData, .bigEndian)
        switch version {
            case .v2_2, .v2_3: frameSize = Int(raw)
            case .v2_4: frameSize = Int(raw.decodingSynchsafe())
        }

And that worked fine for the first...19 frames or so. When suddenly the byte order went wonky.

Screen Shot 2020-04-26 at 4 29 30 PM

I checked the bytes for that particular frame and yep, for no rational reason whatsoever, the size bytes are:

0 b 40 0

Why?

SDGGiesbrecht commented 4 years ago

And that worked fine for the first...19 frames or so. When suddenly the byte order went wonky.

What were the associated identifier and flags? My first hunch would be that the previous frame’s parser may have eaten one byte too little. Then the first zero would actually be part of the flags, and 0B 40 00 [00?] would represent 1344 after synchsafe decoding. That would be a much more reasonable number.

NCrusher74 commented 4 years ago

And that worked fine for the first...19 frames or so. When suddenly the byte order went wonky.

What were the associated identifier and flags? My first hunch would be that the previous frame’s parser may have eaten one byte too little. Then the first zero would actually be part of the flags, and 0B 40 00 [00?] would represent 1344 after synchsafe decoding. That would be a much more reasonable number.

That's why I printed out whole frame to check it over.

        let frameSizeRange = 344..<365
        print(mp3Data.subdata(in: frameSizeRange).hexadecimal())

54 45 4e 43 Identifier (TENC) 0 0 flags 0 b 40 0 size bytes 0 encoding byte 45 6e 63 6f 64 65 64 20 42 79 Contents ("Encoded By")

It all checks out. But the size bytes are in the wrong place in the sequence, and it's a much larger amount than the contents should account for.

the frame before is:

54 43 4f 50 - 0 0 - 0 f 0 0 - 0 - 32 30 32 30 20 43 6f 70 79 72 69 67 68 74

And it checks out too.

NCrusher74 commented 4 years ago

I did notice that something is a little strange in the last column of the Yate raw data for that frame:

00000319 00000344 00000025 (0000T0) TCOP
00000344 00000365 00000021 (2000T0) TENC
00000365 00000392 00000027 (0000T0) TSSE

I wonder if that (2000T0) has something to do with the offset for the size bytes being fdifferent? Maybe this is a known thing that happens?

NCrusher74 commented 4 years ago

Experimenting, it looks like the two frames that have that code in the Yate raw data--the Encoded By and Length frames-- are both frames that are often automatically written to by a tagging app or other writer, rather than usually being user-edited.

In fact, when looking at a file with NO metadata, this is what Yate shows for the raw data:

Screen Shot 2020-04-26 at 6 41 33 PM

So I think this problem may be a result of messing around with a couple frames whose purpose isn't what I necessarily thought it was.

SDGGiesbrecht commented 4 years ago

That's why I printed out whole frame to check it over. So 00 0B 40 00 would have to mean 11 bytes?

Okay, well I don’t really know what is going on then. But 0B is 11, so removing the last half instead of shuffling those bytes to the beginning would enable it to work for all the frame examples you’ve shown me so far. And it would presumably work for any frame up to 16384 bytes in length, since numbers up to that point only need two bytes to represent. But neither this nor the mid‐little endianness really jive with what the specification appears to say. It would be a temporary workaround that would enable you to move on to either things until you can find something that explains how it is supposed work, but it’s not a satisfying solution.

I wonder if that (2000T0) has something to do with the offset for the size bytes being different? Maybe this is a known thing that happens?

What are the things in parentheses? Are those some sort of notation for the flags? (Hypothesis based on File preserve.) Is there a flag that indicates anything about the integer representation?

NCrusher74 commented 4 years ago

Not that I've seen? The flags for v2.4 are "unsynchronization", "extended header", "experimental indicator", and "footer present". Verison 2.3 has three of those four flags, and version 2.2 only has two flags. And I don't really understand what any of those means.

The numbers in parentheses are some sort of internal Yate thing, I think?. I just found it interesting that Yate seems to handle these two frames that are strange to parse differently. I'm thinking of trying to write a file from the ground up using Kid3 instead, and see if I get the same result on those frames.

I'd take them out of my "known frames" completely, but it wouldn't solve anything, because I'd still have to handle them as unknown frames, and that would require being able to parse the frame size.

NCrusher74 commented 4 years ago

So, that seems to be a weirdness with whatever Yate (and maybe Kid3? I can't tell, since the only way to view the raw data is using Yate) seems to write those frames. It's not a compliant thing, but maybe they both use the same library as a framework? id3lib or taglib or whatever. But at any rate, I ran a test using a file written by ID3TagEditorand, aside from the fact that each frame is twice as large as it needs to be because of the encoding, it works perfectly.

Which means that I now have a fully functional tag parser. I just need to implement the tag writing.

NCrusher74 commented 4 years ago

Actually it turns out that it was just weirdness relating to Kid3. As long as I was working with a file that hadn't been touched by Kid3 (or by Fission) there's no issue with the TENC or TLEN frames.

So now I've got my Mp3File.read() -> [FrameKey: Frame] function working, except for writing tests and the few frames I haven't implemented yet.

NCrusher74 commented 4 years ago

Okay, I'm having a strange brain-hiccup looking at where I am and trying to figure out where to go next.

I've got four frames not implemented yet, and I suspect I should probably implement them before writing all the tests for the return from the Mp3File.read() -> Tag function, because if implementing those frames requires me to rework other functionality, it may break any existing tests for other frames.

I figure first I'd work on the Date frame.

I've got several different "kinds" of date frames:

        switch layout {
            case .known(.date): self.frameKey = .date
            // (2.2-2.3) DDMM
            case .known(.encodingTime): self.frameKey = .encodingTime
            // (2.4) timestamp
            case .known(.originalReleaseTime): self.frameKey = .originalReleaseTime
            // (2.2-2.3) YYYY, (2.4) timestamp
            case .known(.recordingDate): self.frameKey = .recordingDate
            //  (2.2-2.3) "4th-7th June, 12th June" (2.4) timestamp
            case .known(.releaseTime): self.frameKey = .releaseTime
            // timestamp 2.4
            case .known(.taggingTime): self.frameKey = .taggingTime
            // timestamp 2.4
            case .known(.time): self.frameKey = .time
            // HHMM
            case .known(.year): self.frameKey = .year
            // YYYY
            default: self.frameKey = .userDefinedText(description: "")
        }

(a "timestamp" in the 2.4 spec is defined as follows:

The timestamp fields are based on a subset of ISO 8601. When being as precise as possible the format of a time string is yyyy-MM-ddTHH:mm:ss (year, "-", month, "-", day, "T", hour (out of 24), ":", minutes, ":", seconds), but the precision may be reduced by removing as many time indicators as wanted. Hence valid timestamps are yyyy, yyyy-MM, yyyy-MM-dd, yyyy-MM-ddTHH, yyyy-MM-ddTHH:mm and yyyy-MM-ddTHH:mm:ss. All time stamps are UTC. For durations, use the slash character as described in 8601, and for multiple non- contiguous dates, use multiple strings, if allowed by the frame definition.

)

But I'd like to wrap that into one frame "type" and the differentiate based on the specific layout.

Just taking a guess at it, I've gotten as far as:

    var year: Int?
    var month: Int?
    var day: Int?
    var hour: Int?
    var minute: Int?
    var timeStamp: Date

    private init(layout: FrameLayoutIdentifier,
                 timestamp: Date) { ... }

I was thinking that maybe each individual public initializer would have the parameters it needs as integers, and feed them into the timestamp parameter for the private initializer? But I could be wrong on that.


At the same time, my magpie brain also keeps also trying to figure out how to get the frame contents from the Mp3File.read() -> Tag function. I've can do:

    func testReadV23() throws {
        let mp3File = try Bundle.mp3V23()
        let tag = try mp3File.read()

        let frames = tag.frames
        print(frames[.album]) // <- FrameKey
    }

So I can get the FrameKey, and from there the Frame, but the Frame is

Optional(SwiftTagger_MacOS.Frame.stringFrame(SwiftTagger_MacOS.StringFrame(flags: 2 bytes, layout: SwiftTagger_MacOS.FrameLayoutIdentifier.known(SwiftTagger_MacOS.KnownFrameLayoutIdentifier.album), frameKey: SwiftTagger_MacOS.FrameKey.album, contentString: "Album")))

...and all I really want is contentString from all of that. Or, you know, the content for a particular frame may be. And I'm drawing a blank on how to get there. I need a variable in FrameKey, right? But how do I differentiate the sort of return from that variable?

SDGGiesbrecht commented 4 years ago

55 53 4c 54 - 0 0 - 22 6a 0 0 is the first 10 bytes, or the frame header.

54 45 4e 43 Identifier (TENC) 0 0 flags 0 b 40 0 size bytes

🤦

I woke up in the middle of the night with the real answer to all of this. For frames the flags come after the size. From the 2.4 specification:

  1. ID3v2 frame overview

    All ID3v2 frames consists of one frame header followed by one or more fields containing the actual information. The header is always 10 bytes and laid out as follows:

    Frame ID $xx xx xx xx (four characters) Size 4 * %0xxxxxxx Flags $xx xx

So the first one is actually 55 53 4c 54 - 0 0 22 6a - 0 0, where the size is simply big endian (synchsafe), and the flags are empty. Notice the 1s place and 256s place make sense now.

And the one that tripped us was actually 54 45 4e 43 - 0 0 0 b - 40 0. They had flags set, and we were trying to interpret the flags as somehow part of the integer. But the actual 65 536s place and 16 777 216s place were the zeros we were treating as the flags.

...and all I really want is contentString from all of that. Or, you know, the content for a particular frame may be. And I'm drawing a blank on how to get there.

For an arbitrary frame, a client would do this:

if let authorFrame = tag.frames[.author],
  case .string(let stringFrame) = authorFrame {
    print(stringFrame.contentString)
} else {
    print("No author.")
}

But for common ones, you may as well add computed properties that do all the work:

extension Tag {

  var author: String? {
    get {
      if let authorFrame = tag.frames[.author],
        case .string(let stringFrame) = authorFrame {
          return stringFrame.contentString
      } else {
        return nil
      }
    }
    set {
      let frame = StringFrame(identifer: .author, content: newValue)
      add(frame: .string(frame))
    }
  }
}

Then clients can just do:

print(tag.author ?? "No author")

I didn’t double check anything, so I might be using some of the wrong names, but you get the idea, right?

NCrusher74 commented 4 years ago

I...

splutters incoherently

I can't even. I've been looking at those specs day in and day out for weeks and I could have sworn the flags came before the size. Because, of course, I was confusing them with the tag spec in my head. But...

Headdesk x 1,000,000

Okay. Done being stupid now.

Yes, I think I get what you're saying about the Tag extension. Thank you.

NCrusher74 commented 4 years ago

How would I go about implementing the tag extension for frames with special-case identifiers, like description or language?

    var language: String? {
        get {
            if let frame = self.frames[.languages(language: languageString)], // doesn't work
                case .languageFrame(let languageFrame) = frame {
                return languageFrame.languageString
            } else {
                return nil
            }
        }
        set {
            let frame = LanguageFrame(language: newValue ?? "und")
            frames[.languages(language: language ?? "und")] = .languageFrame(frame)
        }
    }
SDGGiesbrecht commented 4 years ago

They couldn’t be properties, they would have to be a pair of methods (tag.comment(for: "eng") & tag.setComment(for: "eng")) or a subscript (tag[comment: "eng"]), so that they could be parameterized. But otherwise they would work less the same.