golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.24k stars 17.47k forks source link

encoding/xml: Encoder duplicates namespace tags #7535

Open gopherbot opened 10 years ago

gopherbot commented 10 years ago

by seanerussell:

== What does 'go version' print?

go version go1.2.1 darwin/amd64

== What steps reproduce the problem?

http://play.golang.org/p/3_oUruPYhq

== What happened?

Encoder.EncodeToken duplicates namespace attributes.

== What should have happened instead?

The encoded document should have had a single namespace attribute.

== Please provide any additional information below.

Attribute names on an element must be unique; this is a well-formedness constraint per
the XML 1.0 specification (http://www.w3.org/TR/xml/#uniqattspec). Per the
specification, both validating and non-validating parsers must report well-formedness
violations (http://www.w3.org/TR/xml/#sec-conformance).

Encoding and decoding XML documents should be idempotent and produce equivalent
documents.  This issue means that not only that decoding and encoding the result
produces a non-equivalent document, but that the document it generates is
not-well-formed.

This issue only occurs with namespaces.  Normal attributes are handled correctly.
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-none.

rogpeppe commented 9 years ago

Comment 2:

Here's another example: http://play.golang.org/p/GTjuLNxE-d
This lack of encode/decode idempotency makes things awkward when trying to test for
expected output, as well as the lack of well formedness.
gopherbot commented 9 years ago

Comment 3:

CL https://golang.org/cl/179540043 mentions this issue.
mikioh commented 9 years ago

See #11841

rsc commented 8 years ago

Blocked on #13400.

pdw-mb commented 8 years ago

I would expect Token to strip xmlns attributes: if you want them, use RawToken.

The change below does that and appears to fix both of the above examples:

https://code.blinkace.com/go/xml/commit/bded824c18c5a2595e750c920ea5e7437607900c

The code base above also exposes the current set of namespace bindings on Decoder, which is generally more useful that having the xmlns attributes themselves (see #12406)

alexellis commented 7 years ago

I wanted to know if there is a work around for this yet (duplication of namespaces)?

I want to use a decoder/encoder combination with Token() to selectively reconstitute an XML document (.NET csproj format)

    for {
        token, _ := decoder.Token()

        encoder.EncodeToken(token)
        if token == nil {
            break
        }
     }

Creating and maintaining all the structs to demarshal into an object is not a suitable solution.

Input:

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>
    <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
    <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
    <ProductVersion>9.0.30729</ProductVersion>
    <SchemaVersion>2.0</SchemaVersion>
    <ProjectGuid>{153CB7F7-EB7B-44F2-B53E-F157288E3F19}</ProjectGuid>
    <OutputType>Library</OutputType>
    <AppDesignerFolder>Properties</AppDesignerFolder>

Output:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <Configuration xmlns="http://schemas.microsoft.com/developer/msbuild/2003" Condition=" &#39;$(Configuration)&#39; == &#39;&#39; ">Debug</Configuration>
    <Platform xmlns="http://schemas.microsoft.com/developer/msbuild/2003" Condition=" &#39;$(Platform)&#39; == &#39;&#39; ">AnyCPU</Platform>
    <ProductVersion xmlns="http://schemas.microsoft.com/developer/msbuild/2003">9.0.30729</ProductVersion>
    <SchemaVersion xmlns="http://schemas.microsoft.com/developer/msbuild/2003">2.0</SchemaVersion>
    <ProjectGuid xmlns="http://schemas.microsoft.com/developer/msbuild/2003">{153CB7F7-EB7B-44F2-B53E-F157288E3F19}</ProjectGuid>
    <OutputType xmlns="http://schemas.microsoft.com/developer/msbuild/2003">Library</OutputType>
    <AppDesignerFolder xmlns="http://schemas.microsoft.com/developer/msbuild/2003">Properties</AppDesignerFolder>
SamWhited commented 7 years ago

I ran into this today for the first time (suprisingly).

I wouldn't mind working on this once some of my other XML patches are merged if a decision can be made on how to handle it. Would stripping xmlns attributes from Token() (but leaving them for RawToken()) violate the compatibility guarantee? That seems sensible to me, but I suspect it's not possible this late in the game. Alternatively, maybe we could just not write a second XMLNS tag if one already exists.

UPDATE: There appear to be tests that specifically check for this behavior, but I have no idea why as it seems categorically wrong. Maybe my naive understanding of XML is wrong (as it so often is)?

gopherbot commented 7 years ago

CL https://golang.org/cl/47357 mentions this issue.

gopherbot commented 6 years ago

Change https://golang.org/cl/107755 mentions this issue: encoding/xml : fix duplication of namespace tags by encoder

iwdgo commented 6 years ago

A tag prefix identifies the name space of the tag (https://www.w3.org/TR/xml/#sec-starttags) and not the default name space like xmlns="...". Writing the prefix is incorrect when it is bound to a name space using the standard xmlns:prefix="..." attribute. This fix skips this print and duplication is avoided in line with name space standard in reference. It fixes this issue and well-formed XML is always produced. To keep the previous behavior, the prefix is printed in all other cases.

Some logic was added to handle exceptions. The produced tag includes strings of attributes like xmlns="space" xmlns:_xmlns="xmlns" _prefix="..." With the absence of duplication, these strings do not appear anymore and have been removed in all wants of tests.

Only, explicit namespace and a colliding prefix can produce not well-formed XML because of attributes like xmlns:x="x" which are added by the described exception handling.

gopherbot commented 6 years ago

Change https://golang.org/cl/109855 mentions this issue: encoding/xml : Fixes to enforce XML namespace standard

ewan-chalmers commented 2 years ago

I want to read an XML file, write to another file, replacing a node in the tree.

I'm doing that using roughly:

decoder.Token()

switch {
    // on receiving the token we want to replace
    encoder.EncodeElement(fragment, t)
    continue
}

encoder.EncodeToken()

In the output document, every element has xmlns="http://schemas.xmlsoap.org/encoding/, which is there namespace of the root node of the original doc.

I can't find a way to suppress that

I guess I need to find a different way to transform the file

ewan-chalmers commented 2 years ago

iwdgo

@iwdgo Given input XML like this

<AnyConnectProfile 
  xmlns="http://schemas.xmlsoap.org/encoding/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://schemas.xmlsoap.org/encoding/ AnyConnectProfile.xsd">

should I be able to prevent the Encoder from adding xmlns="http://schemas.xmlsoap.org/encoding/ to every element in output by doing something like this?

        switch t := token.(type) {
        case xml.StartElement:
            if t.Name.Local == "AnyConnectProfile" {
                t.Attr = append(t.Attr, xml.Attr{xml.Name{"", "xmlns:prefix"}, ""},)
                token = t
            }

I say something like, because the above does not work. It results in this on the root element

<AnyConnectProfile 
  xmlns="http://schemas.xmlsoap.org/encoding/" 
  xmlns="http://schemas.xmlsoap.org/encoding/" 
  xmlns:_xmlns="xmlns" 
  _xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:_XMLSchema-instance="http://www.w3.org/2001/XMLSchema-instance" 
  _XMLSchema-instance:schemaLocation="http://schemas.xmlsoap.org/encoding/ AnyConnectProfile.xsd" 
  xmlns:prefix="">

and xmlns="http://schemas.xmlsoap.org/encoding/" on every other element