dgryski / semgrep-go

Go rules for semgrep and go-ruleguard
MIT License
457 stars 37 forks source link

yaml: errorprone with []byte #42

Open ainar-g opened 2 years ago

ainar-g commented 2 years ago

Consider the code:

package main

import (
    "encoding/json"
    "fmt"
    "strings"

    "gopkg.in/yaml.v2"
)

type T struct {
    B []byte `json:"b" yaml:"b"`
}

const jsonData = `{"b":"aGVsbG8K"}`

const yamlData = `b: aGVsbG8K`

func main() {
    var err error
    var t T

    err = json.NewDecoder(strings.NewReader(jsonData)).Decode(&t)
    fmt.Printf("%v %v\n", err, t)

    t = T{}
    err = yaml.NewDecoder(strings.NewReader(yamlData)).Decode(&t)
    fmt.Printf("%v %v\n", err, t)
}

Here, the programmer assumed that []byte fields in gopkg.in/yaml.v2 behave the same way as in encoding/json. But they don't:

<nil> {[104 101 108 108 111 10]}
yaml: unmarshal errors:
  line 1: cannot unmarshal !!str `aGVsbG8K` into []uint8 {[]}

It seems like you still can use []byte with that module, but only if you actually use a YAML array, which is probably not something most people want:

b:
- 104
- 101
- 108
- 108
- 111
- 10
disconnect3d commented 2 years ago

Huh, nice case :)

fwiw https://grep.app/search?q=%5C%5B%5C%5Dbyte%20%60.%2Ayaml&regexp=true

EDIT: relevant:

ainar-g commented 2 years ago

@disconnect3d, I think it might work in some libraries if they parse YAML 1.1 as opposed to YAML 1.2. In fact, the link to the !!binary type is for YAML 1.1, and YAML 1.2 has explicitly dropped it.

dgryski commented 2 years ago

I'm not sure semgrep has enough type information to figure this out. Ruleguard might. If you can figure out a way to detect this with one of the tools, so ahead and open a PR.