[WIP / Proof of concept] Add Decoding support for enums with associated values

jsbean commented 5 years ago

This PR is an initial step towards adding support for Decoding union-type–like enums-with-associated-values (as presented in #25 and #91).

There is a single regression test which is currently failing (NestingTests.testDecodeUnkeyedWithinUnkeyed()), though it doesn't seem out of reach to fix it.

I have added test cases for both #25 and #91, which are passing. For the case of #91, I haven't had success decoding the empty Break type.

The changes here seem both a bit magical as well as grotesque, but it seems like it is at a good place to stop and talk about further directions!

There are three changes made to the source here:

In XMLCoderElement.transformToBoxTree(), a special case is made to prevent enums-with-associated-value–like types getting represented as attributed unkeyed values.
In XMLDecoderImplementation.unkeyedContainer(), keyed boxes get transformed into unkeyed boxes wherein each key-value pair from the original is packaged up into its own KeyedBox carrying a single element representing the key and value. This allows the key to be retrievable downstream in the XMLUnkeyedDecodingContainer.
In XMLUnkeyedDecodingContainer.decode(...), the container packaged up by XMLDecoderImplementation.unkeyedContainer() gets unpackaged in the case it may hold the contents of an enum-with-associated-value.

MaxDesiatov commented 5 years ago

Hi @jsbean, thank you for having a closer look into this! In my previous attempts to implement this, I came to the conclusion that this needs a public API change. It seems that the approach with intrinsic coding keys could work. Similarly to what we have with a value intrinsic, we could have xmlElementName intrinsic. In fact, I think value intrinsic should be renamed to xmlElementValue to avoid possible name collisions in the future.

The reason is that the decoder of the inner value needs to know the name of the element to be able to switch on it and then decode an appropriate enum case. I'm very interested to know if you were able to make this stuff work without any public API changes or new intrinsic coding keys?

MaxDesiatov commented 5 years ago

Having a closer look at this PR, I'm not sure this is an ideal construct:

do {
  self = .int(try container.decode(IntWrapper.self, forKey: .int))
} catch {
  self = .string(try container.decode(StringWrapper.self, forKey: .string))
}

This looks ok just for 2 enum cases, but say you have 5 of them and you get a pyramid of doom of nested do/catch blocks. With the intrinsic it could look like this (scales well to any number of cases):

switch xmlElementName {
case "int":
  self = .int(try container.decode(IntWrapper.self, forKey: .int))
case "string"
  self = .string(try container.decode(StringWrapper.self, forKey: .string))
}

This also potentially could be more performant, since you wouldn't need to create a new decoding container and try the next one if that fails.

MaxDesiatov commented 5 years ago

BTW, I had a test for this in #106, but that was blocked for a while by the changes I had to merge for ordered containers. Now that's done, adding this element name intrinsic shouldn't be that hard.

jsbean commented 5 years ago

Thanks for lookin' over! Yup, this is all done internally with no public API change.

I very much agree that the do/catch pyramid of doom is unappetizing. That said, the pattern used here works out of the box with JSONDecoder, and seems to be the recommended way of solving this in the JSON universe (see: here, here, and here).

While this doesn't scale well from a syntactical perspective, the implementation of Codable conformance for types like these looks like a job for a metaprogramming library such as Sourcery or a home-brewed solution using SwiftSyntax. This is how I plan to solve it for a downstream project which exhibits symptoms of the pathological case of this problem :). Perhaps auto-conformance could even be added to the Standard Library some day.

There are a couple downsides that I see to the xmlElementName intrinsic approach (and please correct me if I am misunderstanding anything!).

First, it requires the user to add the xmlElementName field to their otherwise–format-agnostic model. I feel that there is an argument for keeping the Codable implementation reusable with different encoded formats.

Second, while switching over the xmlElementName keeps indentation from scaling out of control, the String-lyness of it feels a little bit brittle (e.g., imagine you change the name of your enum case, but forget to change the name of the xmlElementName, etc.). The beautiful thing about the CodingKey protocol is its compile-time guarantees for comprehensive coverage and type matching. I would argue that the architecture provided is worth orienting ourselves around.

I hear your concerns about performance, though my initial goal is to mirror the patterns already used in JSON coding from a logical perspective. My intuition is that improvements could be made internally to this approach which would improve performance, and perhaps some optimizations could be made within the init(from:) to prevent creating unnecessary containers.

Would it be worthwhile for me to add some benchmark tests to see where we are from a performance perspective?

I am definitely not tied to this implementation, though I would prefer a solution that does not require XML-specific changes to users' models.

Looking forward to hearing your thoughts!

MaxDesiatov commented 5 years ago

All good points, and I don't think any of your work is conflicting with my xmlElementName proposal. In fact, I'd be happy to merge your implementation first and proceed with any additional solutions when/if we find anything better or see enough demand for improving over the status quo in JSON Codable land 🙂

I don't think any additional benchmarks are needed, although it would be interesting how the existing benchmarks behave on your branch when compared to master.

jsbean commented 5 years ago

A few things are on my the roadmap before I would consider this worthy for merge:

[x] The NestingTests.testDecodeUnkeyedWithinUnkeyed() is only fallaciously passing at this point (the test only verifies that no throwing occurs, but the decoded value is currently not correct). @bwetherfield has beefed up those tests on another branch, and my plan is to iron out the oncoming failure now-ish.
[ ] I would like to add some round-trip encode-decode tests, as I have not yet spent much effort on the encode side of things!

jsbean commented 5 years ago

(I also do want to point out that the test suite has been immensely helpful throughout this process. I am glad that effort was put in by you all to make it feel solid.)

jsbean commented 5 years ago

... it would be interesting how the existing benchmarks behave on your branch when compared to master.

Yes, definitely. Initially, I found ways of breaking the testDecodeArrays benchmark, but then those failures disappeared.

Are the baselines for the benchmark tests that get shipped around with the .xcodeproj file reliable?

MaxDesiatov commented 5 years ago

those do get shipped, but they're tied to a Mac model, i.e. its CPU core count, memory etc, so you probably need to run master performance test on your machine first, save your own baselines and then you'll be able to compare them to the implementation branch. If those don't override existing baselines (you'll see that in git status I guess on a slight chance we have exact same MacBook Pro models 😄), feel free to add them to this PR so that we have more baselines for different Mac models.

jsbean commented 5 years ago

@MaxDesiatov, I am going to close this WIP branch for a bit. In working on the encode side of things, we found some ambiguities.

We think we might have found a nice solution that would address with encode and decode sides in a more elegant fashion. Will come back with a PR shortly.

CoreOffice / XMLCoder

[WIP / Proof of concept] Add Decoding support for enums with associated values #113