golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.37k stars 17.71k forks source link

cmd/go/internal/test: test cache id does not include umask #69508

Open twpayne opened 2 months ago

twpayne commented 2 months ago

Go version

go version go1.23.1 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='arm64'
GOBIN='/Users/twp/.local/bin'
GOCACHE='/Users/twp/Library/Caches/go-build'
GOENV='/Users/twp/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/twp/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/twp'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.23.1/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.23.1/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/twp/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/twp/src/go.googlesource.com/go/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/sc/4d1t02x92z73bqkq4dvn25h40000gn/T/go-build3190885029=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I have several tests that create files and check the permissions of the created files. On UNIX systems, the permissions of the created files depend on the value of umask, which is a process-global variable managed by the OS, the same way that environment variables are process-global variables managed by the OS.

I expect to be able to change umask, run go test, and have go test re-run my tests with the new umask. However, after changing umask, go test incorrectly re-uses its cached test results:

$ go version
go version go1.23.1 darwin/arm64

$ umask 002                               # <--- set umask to 002

$ go test .
ok      tmp     0.252s                    # <--- tests pass when umask is 002

$ umask 022                               # <--- change umask to 022

$ go test .                               # <--- re-run tests
ok      tmp     (cached)                  # <--- BUG: test results re-used from cache

$ go test . -count=1                      # <--- disabling cache gives correct test result
--- FAIL: TestUmask (0.00s)
    umask_test.go:34: got 644, want 664
FAIL
FAIL    tmp     0.249s
FAIL

The example test is:

package main

import (
    "io/fs"
    "os"
    "path/filepath"
    "testing"
)

func TestUmask(t *testing.T) {
    // This tests passes when the umask is 002 (e.g. on Debian-based systems),
    // but fails when the umask is 022 (e.g. on RedHat-based systems and macOS).
    filename := filepath.Join(t.TempDir(), "file")
    if err := os.WriteFile(filename, nil, 0o664); err != nil {
        t.Fatal(err)
    }
    fileInfo, err := os.Stat(filename)
    if err != nil {
        t.Fatal(err)
    }
    // Ensure that the file's permissions are 0o666 &^ 0o002 == 0o664.
    if got, want := fileInfo.Mode().Perm(), fs.FileMode(0o664); got != want {
        t.Fatalf("got %03o, want %03o", got, want)
    }
}

Note that Go creates a test input ID that includes the values of environment variables and the mtimes and sizes of files accessed by the test. This test input ID should also include the initial umask value. I will create a CL that fixes this.

What did you see happen?

go test re-used a cached test result, when it should not have.

What did you expect to see?

go test should not re-use cached test results when the umask is changed.

gabyhelp commented 2 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

twpayne commented 2 months ago

CL: https://go-review.googlesource.com/c/go/+/614035

seankhliao commented 2 months ago

see thread https://groups.google.com/g/golang-nuts/c/x7-8Pk9ZmYU/m/ouJ3DLM0AwAJ

ianlancetaylor commented 2 months ago

Copying in my comment from the mailing list thread:

We do not want to record all aspects of the external environment that could possibly affect the test. That would be too much. So we need some way to know what should be recorded and what should not. What should that be? Our current approach is simple: we record environment variables that the test reads and we record files opened within the module but not files opened outside of the module. Note that if a test opens some file outside of the module, we cache the test results even if that file changes.

What approach should we use for test caching that will tell us that umask should be cached?

twpayne commented 2 months ago

Firstly, thank you Ian for your patience and open mind in discussing this!

We do not want to record all aspects of the external environment that could possibly affect the test. That would be too much.

Agreed.

So we need some way to know what should be recorded and what should not. What should that be?

Given that, as there is so much external state, it is impossible in practice for the go tooling to know what should be recorded.

As a crazy idea, how about enabling the tests themselves to say what should be recorded?

Our current approach is simple: we record environment variables that the test reads and we record files opened within the module but not files opened outside of the module.

This is fair but also a slight simplification. Currently, the go test cache only checks these files sizes and modification times.

For sure, we all agree that reading the full state that the cache depends on is prohibitively expensive, and that cache invalidation is a hard problem.

What approach should we use for test caching that will tell us that umask should be cached?

So, radical idea. What if tests could explicitly declare what they depend on?

Say I could write, for my tests that depend on umask, something like:

func TestThatDependsOnUmask(t *testing.T) {
    t.DependsOn("umask " + getUmask().String())
    // ...
}

where testing.T.DependsOn(s string) adds s to the test input IDs (i.e. modifies the test cache key).

This totally frees the Go authors from having to decide what to include or exclude from the test input ID. Instead, the decision is deferred to users, who are the people who know their test cache requirements best.

Thoughts?

ianlancetaylor commented 2 months ago

It's an interesting idea but I think it would require cmd/go/internal/test/test.go to recognize the "umask" command. So DependsOn seems too flexible: it would be too easy to add a dependency that would not be recognized.

As a separate concern, more knobs is harder to use. It's already straightforward to test differing umask values within a test. And if someone wants to set umask outside of the test binary, it's easy to disable caching by passing -test.count=1. So I don't yet see the compelling argument for this approach.