SimplyKnownAsG / yamlize

Python YAML serializing library
Apache License 2.0
11 stars 5 forks source link

Deterministic Ordering #12

Open john-science opened 2 years ago

john-science commented 2 years ago

For various reasons, I am getting the request to have more deterministic ordering from all the places we use Yamlize.

I would actually be happy to do the work myself, I have been looking at the code this morning and trying to determine where the changes would need to be made.

SimplyKnownAsG commented 2 years ago

Yay! Make the changes!

Yamlize should maintain the order that is read from a file, so that the input is the same as the output. Many of the links provided by @keckler are actually being used as checks to make sure we don't re-apply default values or something. They should all be used for lookups while going through an ordered loop; the sets themselves are not the source of the order.

There are some edge cases that do result in differences. For example, if you set a default value it will disappear from the output.

Also, the map uses an underlying OrderedDict().

I don't see an MWE in the linked example. What types of objects are they seeing the order be unexpected? I suspect that if you want deterministic output from code that is assigning attribute values self.attribute = <value>, you'll need to store the order as it is assigned. Another option would be to use the slots order, but that isn't guaranteed (as we demonstrated in #10).

john-science commented 2 years ago

I have a unit test that demonstrates this bug:

https://github.com/john-science/armi/commit/b075b5719cc0ef9d555e863fe3c8cbf1ed396a56

Essentially, the test always fails, but it wouldn't if we had deterministic ordering. In particular, the attributes of the given block are printed in a different random order every time I run the test.

After some discussion, I found that Yamlize does normally have a deterministic ordering for the YAMLs it writes. But for these pieces of YAMLs (4 layers deep in the hierarchy) that ordering becomes random. This might be due to Yamlize, but since the problem is so closely constrained, I am no longer convinced. It might be something in our code base.

Just FYI. Thanks!

john-science commented 2 years ago

This has proven tricky to solve (on our project's end, not yours). I don't suppose there's any way to just alphabetize the keys in the output YAMLs? (Probably, that's an insane question.)

SimplyKnownAsG commented 2 years ago

Not an insane question. The linked test is more complex than I've been able to follow. Though, I haven't looked at it in quite a while. Can you make an MWE? Also, you can override the to_yaml method to do the order on specific objects, though I don't recommend it (see below).

Yamlize is supposed to be deterministic with respect to the input. The order of attributes in the input, should be the order of the attributes on the output, deterministically.

I wonder if we can do a bit of xyproblem.info solve this in a different way. It sounds like the current issue is that you get different results in a test based on the order of attributes in a file.