golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.61k stars 17.61k forks source link

encoding/xml: document why round-trip stability is not guaranteed #44405

Open DemiMarie opened 3 years ago

DemiMarie commented 3 years ago

It is extremely non-obvious why encoding/xml does not guarantee round-trip stability. The package documentation should explain why that is the case, when this is a problem, and what users should do instead.

ianlancetaylor commented 3 years ago

Thanks. Documentation changes don't need to go through the proposal process, so changing this into an ordinary issue.

ianlancetaylor commented 3 years ago

Note that while I don't know the details this is not necessarily going to be appropriate for the package documentation. It may be a better fit for a blog post.

DemiMarie commented 3 years ago

What about something like this?

Warning: The encoding/xml package does not guarantee round-trip stability. If one uses encoding/xml to tokenize an XML document, serializes the tokens, and then re-parses the resulting document, it is possible for the resulting token stream to be different than the original. As a result, encoding/xml should not be used in applications where round-trip stability is required, such as XML-DSIG and SAML. Abusing encoding/xml in these applications has lead to security vulnerabilities in the past and is not supported. See <insert blog post here> for details.

Applications that require round-trip stability should use a third-party library that provides such guarantees. According to their maintainers, the following libraries have been designed for this purpose. <insert list here>. Note that these libraries are maintained by third parties and are not endorsed by Google, the Go Project, or the Go Security Team.

Other uses of encoding/xml are security supported. For example, it is considered a security vulnerability if malicious XML can cause encoding/xml to panic, corrupt memory, or consume excessive resources.

This actually brings up another question: is encoding/xml guaranteed to be deterministic? One way to detect round-trip mismatches is to serialize a document and reparse it, and then check that the reparsed document is identical to the original. Is this a sufficient mitigation?