golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.7k stars 17.49k forks source link

encoding/xml: accepts ill-formed XML declarations #68460

Open DemiMarie opened 1 month ago

DemiMarie commented 1 month ago

Go version

Git main

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/go/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/go/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-071b8d51c1a70fa6b12f0bed2e93370e193333fd Fri Jul 12 22:42:17 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/user/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/user/go/go/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1803937823=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Tried to parse XML with various ill-formed XML declarations, such as the following:

  1. Wrong order of key/value pairs:

    <?xml standalone="yes" version="1.0"?>
  2. Missing space between key/value pairs:

    <?xml version="1.0"standalone="yes"?>
  3. Junk in data:

    <?xml version="1.0" a standalone="yes"?>
  4. Invalid key:

    <?xml version="1.0" dalone="yes"?>
  5. Invalid encoding:

    <?xml version="1.0" encoding="not valid"?>
  6. Invalid standalone value

    <?xml version="1.0" standalone="not valid"?>

What did you see happen?

No error, so long as a CharsetReader that can handle the invalid encoding is provided.

What did you expect to see?

Errors because these documents are ill-formed.

gabyhelp commented 1 month ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

cherrymui commented 1 month ago

Could you share a code snippet for how you parse the XML? Thanks.

DemiMarie commented 1 month ago

https://go.dev/play/p/gmZ-M1l8zVp

package main

import (
    "encoding/xml"
    "os"
    "fmt"
    "strings"
)

func checkIllFormedXMLGetsError(s string) (ok bool) {
    var v error
    d := xml.NewDecoder(strings.NewReader(s))
    d.CharsetReader = func(charset string, reader io.Reader) (io.Reader, error) { return reader, nil }
    tok, err := d.RawToken()
    if tok != nil || err == nil {
        _, v = fmt.Printf("BAD: got a token (%#v) or no error (%#v) when decoding %q\n", tok, err, s)
    } else {
        _, v = fmt.Printf("GOOD: got error (%#v) on ill-formed XML (%q)\n", err, s)
        ok = true
    }
    if v != nil {
        panic(v)
    }
    return
}

func main() {
    illFormedDocs := []string{
        `<?xml standalone="yes" version="1.0"?>`,
        `<?xml version="1.0"standalone="yes"?>`,
        `<?xml version="1.0" a standalone="yes"?>`,
        `<?xml version="1.0" dalone="yes"?>`,
        `<?xml version="1.0" encoding="not valid"?>`,
        `<?xml version="1.0" standalone="not valid"?>`,
    }
    bad := false
    for _, illFormed := range(illFormedDocs) {
        if !checkIllFormedXMLGetsError(illFormed) {
            bad = true
        }
    }
    if bad {
        os.Exit(1)
    }
}
cherrymui commented 1 month ago

Thanks.

cc @rsc @ianlancetaylor