golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.06k stars 17.68k forks source link

encoding/xml: does not parse internal subset of DTD as required by standard #68388

Open DemiMarie opened 4 months ago

DemiMarie commented 4 months ago

Go version

1.22 I believe

Output of go env in your module/workspace:

Whatever is on https://go.dev/play

What did you do?

Parse XML files with ill-formed or misplaced directives, such as:

  1. <!BOGUS><a/>
  2. <!DOCTYPE JUNK A><a/>
  3. <a/><!DOCTYPE a>

https://go.dev/play/p/ZzfA0W3EUMJ has an example.

What did you see happen?

Documents are wrongly accepted, in violation of the XML spec.

What did you expect to see?

Either encoding/xml performs the checks on directives that non-validating parsers must perform, or all directives (including DTDs) are rejected by default. The latter is what .NET’s XML parser does: by default, DTDs are a fatal parse error.

gabyhelp commented 4 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

DemiMarie commented 3 months ago

There are at least two possible fixes here:

  1. Actually parse the DTD. This is complex and a slow path, so I would prefer to use a goyacc-generated parser rather than a handwritten one.
  2. By default, return an error if a DTD is present at all.