golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.6k stars 17.61k forks source link

text/template: confusing overlay template behavior when it starts with a BOM #28482

Open bep opened 5 years ago

bep commented 5 years ago

The program below prints Master: overlay once, the second prints nothing. And no error. The program is a variation of this example: https://golang.org/pkg/html/template/#example_Template_block -- which is how we do template "inheritance" in Hugo. And judging by the issue reports in this area, adding a BOM marker seems to be a fairly common practice among text editors. I can work around this on my end, but this behavior is very surprising.

https://play.golang.org/p/MQRb5HUk6eH

package main

import (
    "fmt"
    "log"
    "os"
    "text/template"
)

func main() {
    for _, prefix := range []string{"", "\ufeff"} {
        var (
            master  = `Master: {{block "list" .}}block{{end}}`
            overlay = prefix + `{{define "list"}}overlay{{end}}`
        )

        masterTmpl, err := template.New("master").Parse(master)
        if err != nil {
            log.Fatal(err)
        }
        overlayTmpl, err := template.Must(masterTmpl.Clone()).Parse(overlay)
        if err != nil {
            log.Fatal(err)
        }

        if err := overlayTmpl.Execute(os.Stdout, ""); err != nil {
            log.Fatal(err)
        }

        fmt.Println()
    }
}

https://github.com/gohugoio/hugo/issues/4895

mvdan commented 5 years ago

It seems like any non-empty prefix makes that prefix be printed, instead of using the master template with the overlay. You're just unlucky that the BOM is non-printable, so it seems like a no-op. Is that behavior wrong? I'm not used at all to blocks and overlays.

If the behavior is wrong, what would your suggestion be - skip BOM characters at the beginning of the input when parsing templates?

mvdan commented 5 years ago

As a useful point of reference, reading Go source files does skip the first character if it's a BOM. This happens in both go/scanner and cmd/compile. So it would probably follow that the same should happen here.

gopherbot commented 5 years ago

Change https://golang.org/cl/145837 mentions this issue: text/template/parse: skip leading BOM character

bep commented 5 years ago

If the behavior is wrong, what would your suggestion be - skip BOM characters at the beginning of the input when parsing templates?

See https://play.golang.org/p/237zLtI9WpR

I have adjusted my example to be more clear.

I understand that you have to make a choice between the "define and the rest", so the "foo" case I understand.

But I would expect the \n and BOM case to behave the same.

ianlancetaylor commented 5 years ago

If your input data contains a BOM character, and you don't want it in the output, you should remove it before passing the contents to text/template. text/template should not be in the business of doing anything other than passing along the non-interpreted characters that it sees.

Go source files are a different matter. There is no program layer interposing between the Go source file and the compiler, so there is nothing to remove a BOM. Valid Go source code can not contain a BOM by definition, but there is no similar restriction on text passed to text/template. Skipping the BOM is documented in the Go spec as an implementation restriction (https://golang.org/ref/spec#Source_code_representation) and there is no such documentation for text/template.

mvdan commented 5 years ago

Ian is right - I had not realised that the Go spec does document the behavior of BOMs, so that's not a precedent for text/template.

I'm unclear if there's anything to do here, then. I don't think ParseFiles is a good place to skip BOMs, as more often than not complex programs use Parse directly. It would be confusing if ParseFiles and Parse behaved differently in the presence of BOMs.

mvdan commented 5 years ago

But I would expect the \n and BOM case to behave the same.

That is an interesting point. I presume that the distinction here is whitespace - will dig into the code later.

bep commented 5 years ago

This issue passes my "is this a bug?" bar with flying colors, and you should fix it.

But I have made my point clear here. If my last playground example does not convince you, nothing will.

mvdan commented 5 years ago

Note that Ian is objecting to a change in BOM handling in the lexer. That is not the only option we have - for example, I'd like to dig into the the last playground link you pasted, because I'm not sure what decides whether "the rest of the template text" is ignored or not.

In other words, bear with us while we figure out exactly what a fix would be like, if there is one.

mvdan commented 5 years ago

I've found the source of the whitespace behavior:

Templates can be redefined in successive calls to Parse. A template definition with a body containing only white space and comments is considered empty and will not replace an existing template's body.

You can see the implementation at https://golang.org/pkg/text/template/parse/#IsEmptyTree. So, as far as I can see, the current implementation behaves exactly as documented, even if that's a bit confusing to users.

The simplest change that comes to mind would be for subsequent Parse calls to also include a leading BOM, or even any non-printable characters anywhere, when deciding whether the new body should not replace the existing body.

However, I imagine that would be a backwards incompatible change. That depends on whether text/template is designed to work with non-printable characters. If it is, then I presume we can't make this change as we could break existing templates.

bep commented 5 years ago

That depends on whether text/template is designed to work with non-printable characters.

I wouldn't expect it to, but that is just a hunch ... That said, I would expect text/template to be designed to work out of the box with most common text editors. And some editors insert "U+FEFF BYTE ORDER MARK (BOM), whose appearance as a magic number at the start of a text stream" (Wikipedia)". So if text/template trims any leading (first rune) \ufeff in Parse, I cannot in my wildest dreams see anything but good.

But note that since I now know about it, this isn't something that I really need to have fixed, but I would not be surprised if others also hit this, so fixing it in the source would be a good thing, overall.

mvdan commented 5 years ago

Thanks - I agree that in practice, this would likely be a good thing to do. But since text/template doesn't explicitly say that it is only about templating printable text, I'm not convinced we can pull the trigger on a version of that fix, even if it's just in ParseFiles.

/cc @robpike @rogpeppe for a decision

ianlancetaylor commented 5 years ago

There are all kinds of packages that read text files. We are not going to go through all of them and change each one individually to discard a leading BOM character. That just doesn't make sense.

mvdan commented 5 years ago

A perhaps less invasive change would be for https://golang.org/pkg/text/template/parse/#IsEmptyTree to also return true if a tree has a leading BOM character.

I still don't know what the best approach would be, but I'd hope there's something we can do in text/template, given that this is a common issue for users.