eminence / xmltree-rs

Reads an XML file into a simple tree structure
MIT License
39 stars 30 forks source link

Attribute order is inconsistent #21

Closed amosonn closed 4 years ago

amosonn commented 4 years ago

When writing elements with multiple attributes, the order of the attributes is inconsistent, even within a single document, and depends on the (undefined) iteration order of the HashMap containing them. This makes manually looking at such documents, or comparing them, more difficult.

As this is irrelevant to the XML semantics, this should probably be added as another option in the configuration. However this affects the code within Element::_write which collects the attributes, and not the code of the emitter itself, so perhaps a bridging-config should be provided which passes the EmitterConfig downstream but maintains this option? Or this could be a default, without an option for overriding it.

Another question is what are the correct semantics for this? Sorting alphabetically? Maybe attributes should be a Vec and not a HashMap after all, delegating this to the user?

Wolvereness commented 4 years ago

I tried using this library today, and this issue caught me. I made a PR to fix it in #22. I did what made the most sense to me: just keep the order of how things were inserted.

eminence commented 4 years ago

Hi all,

Thanks for opening this issue. I do understand the desire to be able to control the ordering of attributes, even though the XML specification says that the ordering is not important. I also have wanted some tool that could perform an intelligent diff of two XML documents, taking into account things like attribute order (if such a tool doesn't exist, it could be a neat thing to write 😃)

The solution from @Wolvereness seems good to me, since you can control the order of the attributes by controlling the insertion order (or by explicitly sorting the IndexMap). I'm not personally a fan of using a Vec, since being able to do elem.attributes.get("attr_name") is way more convenient than iterating through a list.

My primary concern is that switching to IndexMap is a API breaking change. I'm not sure I want to do this, especially given that most users (in my estimation) won't care about attribute order. So does it make sense for the IndexMap version to be enabled behind a non-default cargo feature flag? Most users can continue to use a HashMap (which is also way more common than IndexMap), but for those of you that need a well-defined ordering, you can enable the feature.

Would that work for your use cases @amosonn and @Wolvereness ? (If anyone else is watching and has some thoughts, please chime in as well)

amosonn commented 4 years ago

Yeah, this works for me. Since it is a dependency on another crate, putting it behind a feature flag seems to makes sense as well, people can pull this dep only if they care about this functionality. Thanks @Wolvereness for the implementation and @eminence for looking into it!

Wolvereness commented 4 years ago

@amosonn Updated the PR to use the feature flag.

scottlamb commented 4 years ago

Hmm, doesn't this mean that if crates A and B both use xmltree, crate B adding this cargo feature to its use of xmltree breaks the API that crate A is using? In particular, crate A's construction/usage/destructuring of an xmltree::Element which is written for HashMap is likely to not compile if the feature is set and Element::attributes is of an unexpected type.

IMHO, the right thing to do is to just bump the major version, rather than using a cargo feature. Maybe accumulate any other changes on a branch (like preserving attribute namespaces) and bumping together.

Wolvereness commented 4 years ago

Hmm, doesn't this mean that if crates A and B both use xmltree, crate B adding this cargo feature to its use of xmltree breaks the API that crate A is using? In particular, crate A's construction/usage/destructuring of an xmltree::Element which is written for HashMap is likely to not compile if the feature is set and Element::attributes is of an unexpected type.

Only if crate C depends on both A and B, and both reexport xmltree, and C needs to interchangeably use the exported types.

IMHO, the right thing to do is to just bump the major version, rather than using a cargo feature. Maybe accumulate any other changes on a branch (like preserving attribute namespaces) and bumping together.

Doesn't matter one way or the other to me though. I don't see multiple crates needing to interchangeably use xmltree intermixed in the dependency graph. Really, my recommendation would be use a feature without a major bump, but simultaneously deprecate not using the feature. Next major bump removes it.

amosonn commented 4 years ago

I must say, I would suggest leaving this as a feature, even after an upgrade. Dependancy bloat is always an issue in an environment like Rust's which make it easy to pull in dependencies, and if I didn't care about this ordering functionality I would prefer not to pay for it (in both build times and binary size).

A major version bump can make this feature part of the default though, so people get this by default but can still opt-out if they want.

eminence commented 4 years ago

I'm not really worried about this hypothetical crate "C" that links with two other crates, each using different feature sets from xmltree. If such a crate exists, we can work with them on a solution.

So I'm inclined to keep the IndexMap behind a feature. I'm also inclined to not make it default at the moment, because I still am not convinced that IndexMap is the right general purpose interface for this crate.

So I've just merged #22 and will release it shortly on crates.io. Thanks all!