CoreOffice / XMLCoder

Easy XML parsing using Codable protocols in Swift
https://coreoffice.github.io/XMLCoder/
MIT License
795 stars 107 forks source link

Encode element with empty key, no elements, and attributes #224

Closed wooj2 closed 3 years ago

wooj2 commented 3 years ago

Hi @MaxDesiatov

I may have found an issue with the way we are serializing XML.

Consider a top level structure SimpleScalarPropertiesInput:

public struct SimpleScalarPropertiesInput: Encodable, DynamicNodeEncoding {
    public let nested: NestedWithNamespace?

    public static func nodeEncoding(for key: CodingKey) -> XMLEncoder.NodeEncoding {
        let xmlNamespaceValues = [
            "xmlns",
            "xmlns:xsi",
        ]
        if let key = key as? Key {
            if xmlNamespaceValues.contains(key.stringValue) {
                return .attribute
            }
        }
        return .element
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: Key.self)
        if encoder.codingPath.isEmpty {
            try container.encode("https://example.com", forKey: Key("xmlns"))
        }
        if let nested = nested {
            var nestedContainer = container.nestedContainer(keyedBy: Key.self, forKey: Key("Nested"))
            try nestedContainer.encode(nested, forKey: Key(""))
            try nestedContainer.encode("https://example.com", forKey: Key("xmlns:xsi"))
        }
    }
}

Notice that SimpleScalarPropertiesInput has a single member called nested of type NestedWithNamespace:

public struct NestedWithNamespace: Encodable, DynamicNodeEncoding {
    public let attrField: String?

    public static func nodeEncoding(for key: CodingKey) -> XMLEncoder.NodeEncoding {
        let codingKeys = [
            "xsi:someName"
        ]
        if let key = key as? Key {
            if codingKeys.contains(key.stringValue) {
                return .attribute
            }
        }
        return .element
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: Key.self)
        if encoder.codingPath.isEmpty {
            try container.encode("https://example.com", forKey: Key("xmlns"))
        }
        if let attrField = attrField {
            try container.encode(attrField, forKey: Key("xsi:someName"))
        }
    }
}

Note that we are using Key as a way to use custom CodingKey values (rather than having enum CodingKey: String, CodingKey).

public struct Key: CodingKey {
    public let stringValue: String
    public init(stringValue: String) {
        self.stringValue = stringValue
        self.intValue = nil
    }

    public init(_ stringValue: String) {
        self.stringValue = stringValue
        self.intValue = nil
    }

    public let intValue: Int?
    public init?(intValue: Int) {
        return nil
    }
}

Finally, exercising this code fails the assert statement

let nested = NestedWithNamespace(attrField: "nestedAttrValue")
let input = SimpleScalarPropertiesInput(nested: nested)
let encoder = XMLEncoder()
let encodedData = try encoder.encode(input)
let encodedString = String(data: encodedData, encoding: .utf8)!
print(encodedString)
let expected = """
<SimpleScalarPropertiesInput xmlns="https://example.com"><Nested xmlns:xsi="https://example.com" xsi:someName="nestedAttrValue"></Nested></SimpleScalarPropertiesInput>
"""
assert(encodedString == expected)

Instead, I'm observing that encodedString is producing invalid XML:

<SimpleScalarPropertiesInput xmlns="https://example.com"><Nested xmlns:xsi="https://example.com"> xsi:someName="nestedAttrValue" /></Nested></SimpleScalarPropertiesInput>

(notice that the attribute/value xsi:someName="nestedAttrValue" is not part of the element)

It seems that the expected string should be:

<SimpleScalarPropertiesInput xmlns="https://example.com"><Nested xmlns:xsi="https://example.com" xsi:someName="nestedAttrValue"></Nested></SimpleScalarPropertiesInput>

I've come up with a preliminary approach to fixing this, but would like to get your take on the approach before I start surrounding this with unit tests.

My preliminary approach is posted here: https://github.com/MaxDesiatov/XMLCoder/pull/223

Thanks again for all the support!

wooj2 commented 3 years ago

Hi @MaxDesiatov

In taking a step back, I tried to avoid using empty keys, but unfortunately, I am unable to find a way to achieve this in an elegant/generic manner.

To re-cap, in the initial post, we have two data structures: SimpleScalarPropertiesInput, and NestedWithNamespace.

In SimpleScalarPropertiesInput's encode function, it is responsible for:

  1. encoding the top level namespace
  2. creating a nestedContainer which effectively creates <Nested>
  3. encoding a namespace belonging to <Nested>
  4. calling .encode on member (nested), with an empty key. Using an empty key is a workaround, but seemingly the only way I could make NestedWithNamespace's encode() function be responsible for all of its members.

In NestedWithNamespace's encode function, it is responsible for:

  1. Rendering its members (attrField). Its member happens to be an attribute, so I believe it should be attached inside the <Nested> container tag.

Talking to you offline, I explored two alternatives which avoids using an empty key -- both of them don't work, but for different reasons.

Approach 1: Make NestedWithNamespace responsible for encoding the <Nested>'s namespace and attrField

SimpleScalarPropertiesInput would look like this:

public struct SimpleScalarPropertiesInput: Encodable, DynamicNodeEncoding {
    public let nested: NestedWithNamespace?
    enum CodingKeys: String, CodingKey {
        case xmlns
        case nested = "Nested"
    }
    public static func nodeEncoding(for key: CodingKey) -> XMLEncoder.NodeEncoding {
        let xmlNamespaceValues = [
            "xmlns"
        ]
        if let key = key as? CodingKeys {
            if xmlNamespaceValues.contains(key.stringValue) {
                return .attribute
            }
        }
        return .element
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        if encoder.codingPath.isEmpty {
            try container.encode("https://example.com", forKey: .xmlns)
        }
        if let nested = nested {
            try container.encode(nested, forKey: .nested)
        }
    }
}

Notice how the encode() function changed. It no longer is responsible for encoding the namespace, instead we call try container.encode(nested, forKey: .nested) so that NestedWithNamespace's encode function is responsible for the creation of the container.

This seems fine on the surface, but there is a problem when it comes to NestedWithNamespace:

public struct NestedWithNamespace: Encodable, DynamicNodeEncoding {
    public let attrField: String?
    enum CodingKeys: String, CodingKey {
        case xsiNamespace = "xmlns:xsi"
        case someName = "xsi:someName"
    }
    public static func nodeEncoding(for key: CodingKey) -> XMLEncoder.NodeEncoding {
        let codingKeys = [
            "xmlns:xsi",
            "xsi:someName"
        ]
        if let key = key as? CodingKeys {
            if codingKeys.contains(key.stringValue) {
                return .attribute
            }
        }
        return .element
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)

        // [ The following is problematic]
        //try container.encode("https://example.com", forKey: .xsiNamespace)
        if let attrField = attrField {
            try container.encode(attrField, forKey: .someName)
        }
    }
}

Notice the comment in NestedWithNamespace's encode() that says The following is problematic. We could add the following line try container.encode("https://example.com", forKey: .xsiNamespace) which will result in the correct xml string being encoded, however, this won't work for us because we could have another type like SimpleScalarPropertiesInputV2 which has member of type NestedWithNamespace which has a different namespace applied to that member. Said differently, we have a requirement of the containing type specifying a custom namespace on a member.

For example:

 struct SimpleScalarPropertiesInputV2 {
   public let nested: NestedWithNamespace?
   ...snip...

   public func encode(to encoder: Encoder) throws {
       var container = encoder.container(keyedBy: CodingKeys.self)

        ... snip ...

       // Using NestedWithNamespace's `encode()` function won't work, because the namespace is applied to
       // the `nested` member of this type (which can be different than the namespace defined against
       // the other type: SimpleScalarPropertiesInput)
       if let nested = nested {
           try container.encode(nested, forKey: .nested)
       }
   }
 }

Approach 2: Make SimpleScalarPropertiesInput responsible for encoding the <Nested>'s namespace and attrField

In this approach, SimpleScalarPropertiesInput would look like the following:

public struct SimpleScalarPropertiesInput: Encodable, DynamicNodeEncoding {
    public let nested: NestedWithNamespace?
    enum CodingKeys: String, CodingKey {
        case xmlns
        case nested = "Nested"

        case xmlns_xsi = "xmlns:xsi"
        case xsi_someName = "xsi:someName"
    }
    public static func nodeEncoding(for key: CodingKey) -> XMLEncoder.NodeEncoding {
        let xmlNamespaceValues = [
            "xmlns",
            "xmlns:xsi",
            "xsi:someName"
        ]
        if let key = key as? CodingKeys {
            if xmlNamespaceValues.contains(key.stringValue) {
                return .attribute
            }
        }
        return .element
    }

    public func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        if encoder.codingPath.isEmpty {
            try container.encode("https://example.com", forKey: .xmlns)
        }
        if let nested = nested {
            var nestedContainer = container.nestedContainer(keyedBy: CodingKeys.self, forKey: .nested)
            try nestedContainer.encode("https://example.com", forKey: .xmlns_xsi)
            if let attrField = nested.attrField {
                try nestedContainer.encode(attrField, forKey: .xsi_someName)
            }
        }
    }
}

The code snippet above does render the correct xmlstring, however, this breaks encapsulation, because we are making SimpleScalarPropertiesInput's encode() function responsible for knowing about members in NestedWithNamespace. This is bad because NestedWithNamespace could potentially have another member that is a complex type (which could have a member with a complex type.. and so on), there by making SimpleScalarPropertiesInput responsible for encoding all members, recursively. This sounds really complex and seems to go against the intent of the Codable protocol.

Unfortunately, I don't have any other ideas at the moment, as it doesn't seem possible to pass namespaces/arguments between the encode() functions of different types.

wooj2 commented 3 years ago

Thanks again!

Resolving: https://github.com/MaxDesiatov/XMLCoder/pull/223