CoreOffice / XMLCoder

Easy XML parsing using Codable protocols in Swift
https://coreoffice.github.io/XMLCoder/
MIT License
795 stars 107 forks source link

Optionally expose indexes for interleaved elements #228

Closed liambutler-lawrence closed 2 years ago

liambutler-lawrence commented 3 years ago

This PR adds a new public type, XMLPositionIndexed. This type can be used within a decoding tree to retain the indexing information of the private type KeyedStorage.

Why is this necessary?

This library already provides a way to retrieve the text value of an XML element, by specifying a coding key of an empty string.

However, in many XML documents, sub-nodes are nested at meaningful positions within the text value. For example, ARM's machine-readable XML documentation contains the following node:

<pstext>
    constant bits(16) <anchor>ASID_NONE</anchor> = <a>Zeros</a>();
</pstext>

In the current version of this library, this node's elements can be parsed into 3 arrays:

However, the relative positioning of these elements to each other is irretrievably lost.

With the new XMLPositionIndexed type, this relative positioning information is retained. Our Decodable model for the pstext node above can now look like this:

struct Text: Decodable {

    let valueSegments: [XMLPositionIndexed<String>]
    let links: [XMLPositionIndexed<Link>]
    let anchors: [XMLPositionIndexed<Link>]

    enum CodingKeys: String, CodingKey {
        case valueSegments = ""
        case links = "a"
        case anchors = "anchor"
    }

    struct Link: Decodable {
        let value: String

        enum CodingKeys: String, CodingKey {
            case value = ""
        }
    }
}

Each XMLPositionIndexed object contains the original value as well as an integer index that can be used in post-processing. As an example, we can easily merge all 3 arrays back together:

let mergedSegments = (
    text.valueSegments.map { ($0.index, $0.value) }
        + text.links.map { ($0.index, $0.value.value) }
        + text.anchors.map { ($0.index, $0.value.value) }
).sortedByKey { $0.0 }.map { $0.1 }

// mergedSegments = "constant bits(16) ASID_NONE = Zeros();"
MaxDesiatov commented 2 years ago

The new files need to be added to the existing .xcodeproj for CI to pass. I know this is a chore, but we're still keeping compatibility with Carthage (for now), which does require this Xcode project cruft to exist in the repository.

smumriak commented 2 years ago

Hey everyone! I'm using this awesome library to parse Vulkan API Registry XML and this feature would be very much appreciated :) Consider example:

<member optional="true">const <type>void</type>*            <name>pNext</name></member>

This line defines a member of a structure and it's type. Having indices for interleaved element would allow my code to correctly parse the underlying type of the member. Right now I'm storing those in array of strings (which worked correctly) due to the fact that I'm not looking into any other member except pNext right now, but still would be nice to have for future

MaxDesiatov commented 2 years ago

I'm glad to merge this as soon as we have unit-test coverage for this. Not sure though if the OP has abandoned this PR, but I didn't have time to add coverage myself. Feel free to create a new PR if you're interested in pushing this forward.

MaxDesiatov commented 2 years ago

Hi @liambutler-lawrence, would you be interested to move this PR forward? I'm cleaning up the repository, and if this PR is abandoned and outdated, I'm inclined to close it. Thanks!

MaxDesiatov commented 2 years ago

I'm closing this PR as abandoned, please feel free to reopen otherwise.