BurntSushi / toml

TOML parser for Golang with reflection.
MIT License
4.58k stars 528 forks source link

v1.3.1 fails Go race detector when used concurrently #395

Closed aaron42net closed 1 year ago

aaron42net commented 1 year ago

The BURNTSUSHI_TOML_110 check in v1.3.1 writes to a package global at each invocation of parse() without any locking:

func parse(data string) (p *parser, err error) {
    _, ok := os.LookupEnv("BURNTSUSHI_TOML_110")
    tomlNext = ok
...

This could maybe use sync.Once() instead?

It currently fails Go's race detector when used concurrently:

==================
WARNING: DATA RACE
Write at 0x000000edce0a by goroutine 37:
  github.com/BurntSushi/toml.parse()
      github.com/BurntSushi/toml@v1.3.1/parse.go:36 +0x7b
  github.com/BurntSushi/toml.(*Decoder).Decode()
      github.com/BurntSushi/toml@v1.3.1/decode.go:156 +0x64c
  github.com/BurntSushi/toml.Unmarshal()
      github.com/BurntSushi/toml@v1.3.1/decode.go:28 +0x127
[...]

Previous write at 0x000000edce0a by goroutine 33:
  github.com/BurntSushi/toml.parse()
      github.com/BurntSushi/toml@v1.3.1/parse.go:36 +0x7b
  github.com/BurntSushi/toml.(*Decoder).Decode()
      github.com/BurntSushi/toml@v1.3.1/decode.go:156 +0x64c
  github.com/BurntSushi/toml.Unmarshal()
      github.com/BurntSushi/toml@v1.3.1/decode.go:28 +0x127

I'd be happy to provide code to demonstrate this if that's useful.

arp242 commented 1 year ago

Ah yeah, do'h; how silly >_< Using sync.Once won't work though, as I want people to be able to control it for every parser invocation. I also want to avoid adding a new API for this (as we will never be able to remove it for compatibility) hence the environment variable.

I wanted to avoid getting the environment variable on every tomlNext check as that's comparatively expensive:

BenchmarkEnv-2          28618281                41.64 ns/op            0 B/op          0 allocs/op
BenchmarkEnvWrap-2      28688908                41.62 ns/op            0 B/op          0 allocs/op
BenchmarkGlobal-2       1000000000               0.3607 ns/op          0 B/op          0 allocs/op
Benchmark code
package main

import (
    "os"
    "testing"
)

func BenchmarkEnv(b *testing.B) {
    for n := 0; n < b.N; n++ {
        if _, ok := os.LookupEnv("BURNTSUSHI_TOML_110"); ok {
            _ = ok
        }
    }
}

func check() bool {
    _, ok := os.LookupEnv("BURNTSUSHI_TOML_110")
    return ok
}

func BenchmarkEnvWrap(b *testing.B) {
    for n := 0; n < b.N; n++ {
        if check() {
        }
    }
}

var glob = false

func BenchmarkGlobal(b *testing.B) {
    for n := 0; n < b.N; n++ {
        if !glob {
            _ = glob
        }
    }
}

As it's checked thousands of times even for relatively small documents (once for every character in a key for starters) so it adds up. Then again, maybe that's not really a big deal. There's were a bunch of "micro benchmarks" and I just added a benchmark for a large document – if that doesn't regress too much then it's okay to just convert tomlNext to a function call.

Otherwise the solution is to check the environment variable once on startup and add a struct field to the parser and lexer and check that.

I don't have time for this today, but if you want to create a PR then I can merge it this evening. Would also be nice to test the parallel case (and run the tests with -race in the CI). Otherwise I'll look tomorrow, that's fine too.

arp242 commented 1 year ago

Actually maybe a struct field is better in any case, otherwise setting the environment variable while the parser is running will change the operation of the parser half-way through.

arp242 commented 1 year ago

And 1.3.0 doesn't have this bug; so you can use that in the meanwhile – I changed it for #394

aaron42net commented 1 year ago

If you want the setting to be able to changed per-invocation, you can't use environment anywhere this might be used concurrently. For a given goroutine, between setting the environment and calling Unmarshal, the value may be reset by a different goroutine.

This is why I suggested sync.Once(), to at least document a behavior which will be consistent and safe.

arp242 commented 1 year ago

For a given goroutine, between setting the environment and calling Unmarshal, the value may be reset by a different goroutine.

That's an obscure enough edge case I can live with: it will only affect people who 1) run this concurrently, 2) want to use TOML 1.1 draft, and 3) also want to use TOML 1.0 for other documents (concurrently). The number of people that will affect is very likely to be 0.

arp242 commented 1 year ago

Tagged v1.3.2