proposal: Go 2: permit converting a string constant to a byte array type

jfcg commented 4 years ago

byte slices and strings are convertable to each other:

var bs = []byte("şevkı")
fmt.Println(len(bs), string(bs))

I propose to extend this by allowing to convert:

string constants to byte arrays:
```
var ba = [...]byte("şevkı")
```

byte arrays to strings

var ba = [...]byte{20, 40, 60}
var s = string(ba)

Also:


// should not compile
var ba = [6]byte("şevkı")

// ok var ba = [7]byte("şevkı")

// pad with zeros var ba = [8]byte("şevkı")



I believe this is backward-compatible with Go 1. What do you think?

randall77 commented 4 years ago

The latter you can do with string(ba[:]). The former would only make sense with constant strings. Can you elaborate on situations where that comes up?

jfcg commented 4 years ago

Most systems nowadays are 64-bit. A string takes 16 bytes and a slice takes 24 bytes. It is especially useful if you want to:

stay compact with a small buffer
provide human readable initial values for your byte buffer. Btw I think this is Go 1 compatible, it is not only applicable to Go 2.

randall77 commented 4 years ago

Most systems nowadays are 64-bit. A string takes 16 bytes and a slice takes 24 bytes. It is especially useful if you want to:

stay compact with a small buffer

I don't see how this matters. Your first proposal would happen at compile time, and the second one just needs an extra 8 bytes of stack space (which could be optimized away if we cared).

provide human readable initial values for your byte buffer.

That seems like a reasonable thing to want. You could do

var ba [5]byte
copy(ba[:], "hello")

The only thing that [...]byte{"hello"} really gets you is automatic sizing.

ianlancetaylor commented 4 years ago

How often does code want to initialize a byte array (not slice) with a constant string?

bcmills commented 4 years ago

I have, on occasion, wanted a byte-array from a constant string in order to have something addressable to pass to a C function (https://play.golang.org/p/vTuzoWnwRhA).

I typically end up working around it by using a variable and copy instead, but that is more verbose (and, as @randall77 notes, requires more care about sizing).

I do not know how representative that use-case is.

josharian commented 4 years ago

@bcmills another verbose fix in that kind of case could come from #395 --you'd convert twice, once to []byte, and then to *[N]byte.

jfcg commented 4 years ago

stay compact with a small buffer

I don't see how this matters. Your first proposal would happen at compile time, and the second one just needs an extra 8 bytes of stack space (which could be optimized away if we cared).

this will use extra 24 bytes on the stack compared to a byte array:

func f() {
    var buf = []byte("file000")
    ...
}

If buf is a fixed size buffer and f() is recursive, then redundant extra space rapidly grows.

Initialzing a byte array from a string can be really convenient: It can be used for enumerations of strings with a specific format like above, among other useful cases.

provide human readable initial values for your byte buffer.

That seems like a reasonable thing to want. You could do var ba [5]byte copy(ba[:], "hello") The only thing that [...]byte{"hello"} really gets you is automatic sizing.

Ignoring having to type an extra line, yes automatic sizing is also a convenient gain ;)

jimmyfrasche commented 4 years ago

Minor nit but it seems like it should be [...]byte("string").

networkimprov commented 4 years ago

@jfcg no "v2.0" is planned. The "Go2" label means "defer until language changes resume" (they resumed in 1.13).

randall77 commented 4 years ago

this will use extra 24 bytes on the stack compared to a byte array:

func f() { var buf = []byte("file000") ... }

Comparing, say,

func f() {
    ba := [3]byte{65, 65, 65}
    s := string(ba[:])
}

and

func g() {
    ba := [3]byte{65, 65, 65}
    s := string(ba) // proposed new feature
}

f has to call runtime.slicebytetostring which takes 24 bytes for the slice. g would have to call a hypothetical runtime.arraybytetostring which needs to take a pointer and a size, 16 bytes. In either case, this space is shared between all calls out of f (or g). So if you have a recursive call with 3 word-sized arguments (a string and an int, say), the conversion mentioned here costs no space at all. At worst it costs 8 bytes.

We could easily optimize the expression string(ba[:]) to only pass a ptr+len instead of a ptr+len+cap, and then it wouldn't cost any space, ever.

jfcg commented 4 years ago

Having to call a function like runtime.slicebytetostring for s := string(ba[:]) is weird, let alone passing cap as a parameter to that function redundantly. I thought Go compiler allocates space for string on stack, and just copies ptr and len.

Ok, this is different than having a fixed size buffer and seems the proposal has a slight 8 byte advantage, and three characters [:] less to type ;)

randall77 commented 4 years ago

let alone passing cap as a parameter to that function redundantly.

That is true, slicebytetostring never uses the capacity. I guess just fixing that is equivalent to my runtime.arraybytetostring proposal.

I thought Go compiler allocates space for string on stack, and just copies ptr and len.

It always has to do at least a copy of the data, because the compiler currently has no analysis to prove that the byte array is not modified subsequently (#31506, #2205).

If the result doesn't escape, it can allocate the backing array on the stack. But I don't think the details of that are much different in the two cases. Read runtime/string.go:slicebytetostring for the gory details.

josharian commented 4 years ago

I've started on removing unnecessary cap arguments to a few runtime calls. It's mostly done. I'll plan to finish it up and mail during 1.15. (Writing here to try to avoid duplicate work.)

ianlancetaylor commented 4 years ago

As noted above, we can already convert byte arrays to strings by adding [:], so this is really about converting strings to byte arrays. One of the criteria for Go 2 language changes from https://blog.golang.org/go2-here-we-come is that a language change should "address an important issue for many people".

How often does this really come up in practice? For example, can you find examples of existing code that would be simplified if we added this conversion to the language?

jfcg commented 4 years ago

In the top 15 trending Go repos this month

grep -Pr "\[[^]]+\]byte{('[^']+', )+" .

yields 74 results:

./go-master/doc/gccgo_install.html:var name = [4]byte{'f', 'o', 'o', 0};
./go-master/src/encoding/xml/marshal_test.go:   {Value: &Plain{[3]byte{'<', '/', '>'}}, ExpectXML: `<Plain><V>&lt;/&gt;</V></Plain>`},
./go-master/src/fmt/fmt_test.go:    {"%s", [3]byte{'a', 'b', 'c'}, "abc"},
./go-master/src/fmt/fmt_test.go:    {"%s", &[3]byte{'a', 'b', 'c'}, "&abc"},
./go-master/src/reflect/all_test.go:        s := [...]byte{'_', '_', '_', '_', '_', '_', '_', '_'}
./go-master/src/runtime/race.go:var qq = [...]byte{'?', '?', 0}
./go-master/src/runtime/race.go:var dash = [...]byte{'-', 0}
./go-master/src/syscall/exec_linux.go:  none  = [...]byte{'n', 'o', 'n', 'e', 0}
./go-master/src/syscall/exec_linux.go:  slash = [...]byte{'/', 0}
./go-master/test/fixedbugs/bug102.go:   var b1 = [5]byte{'h', 'e', 'l', 'l', 'o'}
./go-master/test/fixedbugs/issue15528.go:   deep interface{} = [1]struct{ a *[2]byte }{{a: &[2]byte{'z', 'w'}}}
./go-master/test/fixedbugs/issue15528.go:   if !reflect.DeepEqual(*(deep.([1]struct{ a *[2]byte })[0].a), [2]byte{'z', 'w'}) {
./go-master/test/rotate.go: cop = [2]byte{'|', '^'}
./kops-master/vendor/github.com/hashicorp/go-msgpack/codec/time.go: timeDigits = [...]byte{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
./kops-master/vendor/golang.org/x/crypto/salsa20/salsa/hsalsa20.go:var Sigma = [16]byte{'e', 'x', 'p', 'a', 'n', 'd', ' ', '3', '2', '-', 'b', 'y', 't', 'e', ' ', 'k'}
./libpod-master/vendor/github.com/uber/jaeger-client-go/thrift/simple_json_protocol.go:     fill := [...]byte{'=', '=', '='}
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'a', 'r', 't', '-', 'l', 'o', 'j', 'b', 'a', 'n'}: _jbo, // art-lojban
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'a', 'm', 'i'}:                          _ami, // i-ami
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'b', 'n', 'n'}:                          _bnn, // i-bnn
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'h', 'a', 'k'}:                          _hak, // i-hak
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'k', 'l', 'i', 'n', 'g', 'o', 'n'}:      _tlh, // i-klingon
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'l', 'u', 'x'}:                          _lb,  // i-lux
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'n', 'a', 'v', 'a', 'j', 'o'}:           _nv,  // i-navajo
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'p', 'w', 'n'}:                          _pwn, // i-pwn
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 't', 'a', 'o'}:                          _tao, // i-tao
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 't', 'a', 'y'}:                          _tay, // i-tay
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 't', 's', 'u'}:                          _tsu, // i-tsu
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'n', 'o', '-', 'b', 'o', 'k'}:                     _nb,  // no-bok
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'n', 'o', '-', 'n', 'y', 'n'}:                     _nn,  // no-nyn
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'f', 'r'}:      _sfb, // sgn-BE-FR
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'n', 'l'}:      _vgt, // sgn-BE-NL
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'s', 'g', 'n', '-', 'c', 'h', '-', 'd', 'e'}:      _sgg, // sgn-CH-DE
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'z', 'h', '-', 'g', 'u', 'o', 'y', 'u'}:           _cmn, // zh-guoyu
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'z', 'h', '-', 'h', 'a', 'k', 'k', 'a'}:           _hak, // zh-hakka
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'z', 'h', '-', 'm', 'i', 'n', '-', 'n', 'a', 'n'}: _nan, // zh-min-nan
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'z', 'h', '-', 'x', 'i', 'a', 'n', 'g'}:           _hsn, // zh-xiang
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'c', 'e', 'l', '-', 'g', 'a', 'u', 'l', 'i', 's', 'h'}: -1, // cel-gaulish
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'e', 'n', '-', 'g', 'b', '-', 'o', 'e', 'd'}:           -2, // en-GB-oed
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'd', 'e', 'f', 'a', 'u', 'l', 't'}:           -3, // i-default
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'e', 'n', 'o', 'c', 'h', 'i', 'a', 'n'}:      -4, // i-enochian
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'i', '-', 'm', 'i', 'n', 'g', 'o'}:                     -5, // i-mingo
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'z', 'h', '-', 'm', 'i', 'n'}:                          -6, // zh-min
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'r', 'o', 'o', 't'}:                                    0,  // root
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:       [maxLen]byte{'e', 'n', '-', 'u', 's', '-', 'p', 'o', 's', 'i', 'x'}: -7, // en_US_POSIX"
./terraform-master/vendor/github.com/Azure/go-ntlmssp/messageheader.go:var signature = [8]byte{'N', 'T', 'L', 'M', 'S', 'S', 'P', 0}
./terraform-master/vendor/github.com/ugorji/go/codec/binc.go:// var timeDigits = [...]byte{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'a', 'r', 't', '-', 'l', 'o', 'j', 'b', 'a', 'n'}: _jbo, // art-lojban
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'a', 'm', 'i'}:                          _ami, // i-ami
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'b', 'n', 'n'}:                          _bnn, // i-bnn
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'h', 'a', 'k'}:                          _hak, // i-hak
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'k', 'l', 'i', 'n', 'g', 'o', 'n'}:      _tlh, // i-klingon
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'l', 'u', 'x'}:                          _lb,  // i-lux
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'n', 'a', 'v', 'a', 'j', 'o'}:           _nv,  // i-navajo
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'p', 'w', 'n'}:                          _pwn, // i-pwn
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 't', 'a', 'o'}:                          _tao, // i-tao
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 't', 'a', 'y'}:                          _tay, // i-tay
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 't', 's', 'u'}:                          _tsu, // i-tsu
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'n', 'o', '-', 'b', 'o', 'k'}:                     _nb,  // no-bok
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'n', 'o', '-', 'n', 'y', 'n'}:                     _nn,  // no-nyn
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'f', 'r'}:      _sfb, // sgn-BE-FR
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'n', 'l'}:      _vgt, // sgn-BE-NL
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'s', 'g', 'n', '-', 'c', 'h', '-', 'd', 'e'}:      _sgg, // sgn-CH-DE
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'z', 'h', '-', 'g', 'u', 'o', 'y', 'u'}:           _cmn, // zh-guoyu
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'z', 'h', '-', 'h', 'a', 'k', 'k', 'a'}:           _hak, // zh-hakka
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'z', 'h', '-', 'm', 'i', 'n', '-', 'n', 'a', 'n'}: _nan, // zh-min-nan
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'z', 'h', '-', 'x', 'i', 'a', 'n', 'g'}:           _hsn, // zh-xiang
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'c', 'e', 'l', '-', 'g', 'a', 'u', 'l', 'i', 's', 'h'}: -1, // cel-gaulish
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'e', 'n', '-', 'g', 'b', '-', 'o', 'e', 'd'}:           -2, // en-GB-oed
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'd', 'e', 'f', 'a', 'u', 'l', 't'}:           -3, // i-default
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'e', 'n', 'o', 'c', 'h', 'i', 'a', 'n'}:      -4, // i-enochian
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'i', '-', 'm', 'i', 'n', 'g', 'o'}:                     -5, // i-mingo
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'z', 'h', '-', 'm', 'i', 'n'}:                          -6, // zh-min
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'r', 'o', 'o', 't'}:                                    0,  // root
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:        [maxLen]byte{'e', 'n', '-', 'u', 's', '-', 'p', 'o', 's', 'i', 'x'}: -7, // en_US_POSIX"

This is an understated statistics. It is pretty clear this feature is direly needed...

earthboundkid commented 4 years ago

This compiles on the playground:

    const staticname = "foo\x00"
    var name [len(staticname)]byte
    copy(name[:], staticname)

Do we really need it to be more convenient than this?

jfcg commented 4 years ago

Are you seriously comparing that to:

var name = [...]byte("foo\x00")

C and C++ has this for years:

char name[] = "foo";

josharian commented 4 years ago

@jfcg I think he was serious, yes. Please be polite. Although you may disagree, it is not an unreasonable position; the bar for language changes is very high.

earthboundkid commented 4 years ago

The C/C++ version is not directly relevant to the discussion:

Null terminated strings are the biggest security disaster in the history of computing
Hiding a null at the end of a string literal leads to a lot of beginner confusion and even mistakes for advanced programmers
C/C++ don't distinguish the types of different array lengths

The question is not "is [...]byte("") shorter/better?" Of course it is. But everything is a tradeoff. Are byte arrays useful enough, often enough to be worth the cost of updating the language and adding a special casing for this? One point against special casing it is that you can use [len(constant)]byte as a type.

jfcg commented 4 years ago

However you dont enjoy C way of doing things, there is a vast world out there that Go code needs to interact with, and that world IS in C/C++. See dozens of examples above. This will make tons of array initializations human-readable. Dont trade reason with conservatism. This is a clearly needed backward compatible feature..

jfcg commented 4 years ago

In the top 23 active Go repos

grep -Pr '\[\]byte\("[^"]+"\)' .

yields 4076 results of creating a byte slice from a constant simple string: People do create a lot of buffers from constant strings. Slices are rightly more popular becuase IO functions require them.

In cases where slices are not mandatory, because it is easier to read and write like []byte("input"), people probably choose slices over arrays because array initialization is more cumbersome.

So when applicable, this feature also allows more use of arrays with human-readable initial values, with less redundant space lost to 24 bytes occupied by slices.

mrkanister commented 4 years ago

In the top 15 trending Go repos this month
grep -Pr "\[[^]]+\]byte{('[^']+', )+" .
yields 74 results: [...]

Keep in mind that the occurrences in github.com/golang/text have been strings before and it looks like they were only changed because "checking for grandfathered tags is in the critical path". Also, half of the other matches are in unit tests, HTML files or comments, leaving only the following 8 remaining:

github.com/Azure/go-ntlmssp/messageheader.go:var signature = [8]byte{'N', 'T', 'L', 'M', 'S', 'S', 'P', 0}
github.com/golang/go/src/runtime/race.go:var dash = [...]byte{'-', 0}
github.com/golang/go/src/runtime/race.go:var qq = [...]byte{'?', '?', 0}
github.com/golang/go/src/syscall/exec_linux.go: none  = [...]byte{'n', 'o', 'n', 'e', 0}
github.com/golang/go/src/syscall/exec_linux.go: slash = [...]byte{'/', 0}

github.com/hashicorp/go-msgpack/codec/time.go:  timeDigits = [...]byte{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
github.com/uber/jaeger-client-go/thrift/simple_json_protocol.go:        fill := [...]byte{'=', '=', '='}
golang.org/x/crypto/salsa20/salsa/hsalsa20.go:var Sigma = [16]byte{'e', 'x', 'p', 'a', 'n', 'd', ' ', '3', '2', '-', 'b', 'y', 't', 'e', ' ', 'k'}

The first 5 are interesting cases, because the last byte is 0. Is this something that your proposal could/would cover? The one where I see the biggest benefit is the last match:

// before
var Sigma = [16]byte{'e', 'x', 'p', 'a', 'n', 'd', ' ', '3', '2', '-', 'b', 'y', 't', 'e', ' ', 'k'}

// after
var Sigma = [16]byte("expand 32-byte k")

I am neither against your proposal, nor am I in favor of it. However, I think that you will need to provide some more evidence to make your case and claim that

This is an understated statistics. It is pretty clear this feature is direly needed...

Just my 2 cents.

bcmills commented 4 years ago

The first 5 are interesting cases, because the last byte is 0. Is this something that your proposal could/would cover?

Note that it is trivial to make the last byte of a string constant 0 using the hex escape \x00: https://play.golang.org/p/apwVP7FbwF3

jfcg commented 4 years ago

@mrkanister, three crucial points of this proposal are:

Readability & ease of typing

var signature = [...]byte("NTLMSSP\x00")
var qq = [...]byte("??\x00")
var none = [...]byte("none\x00")
var timeDigits = [...]byte("0123456789")
var fill = [...]byte("===")
var Sigma = [16]byte("expand 32-byte k")

Saving 24 bytes of space where you just need an array with readable initial value
```
func f() {
var buf = [...]byte("file000")
...
```
Go 1 compatible

Also, if 1% of 4076 cases can be handled with arrays, then you have 40+ more examples. Please feel free to dig. I believe I have provided quite a lot of examples.

Cheers

OneOfOne commented 4 years ago

Honestly this would be fixed with #22876 or #28591.

jfcg commented 4 years ago

Honestly this would be fixed with #22876 or #28591.

I dont see how. Can you give an example? How do I get pretty array initializations ?

OneOfOne commented 4 years ago

The whole point is having a const slice as far as I can tell.

jfcg commented 4 years ago

The whole point is having a const slice as far as I can tell.

Not even close.. It is byte arrays man..

mrkanister commented 4 years ago

@mrkanister, three crucial points of this proposal are:

Readability & ease of typing

Saving 24 bytes of space where you just need an array with readable initial value

Go 1 compatible

[...]

Also, if 1% of 4076 cases can be handled with arrays, then you have > 40+ more examples.

@jfcg, I will try to restate your idea with my own words. Please correct me if I got something wrong:

The basis for your proposal is that using byte arrays instead of byte slices is preferred because it saves a significant amount of space. Therefore, using byte arrays should be at least as convenient as using byte slices, which it would be using your suggested language change.

Other people in this issue have pointed out that space overhead could be addressed by making some compiler and/or runtime optimizations. Can you think of any other use cases where using byte arrays instead of byte slices is preferred/required that would justify the language change?

jfcg commented 4 years ago

@mrkanister, @josharian talked about removing the redundant cap parameter to string(slice).

I dont recall (or missed it) anything related to the unnecessary 24 bytes in

func f() {
   var buf = []byte("file000")
...

when you just need the local buffer, and not the slice. Are we on the same page on the cases where there can be a redundant 24 bytes wasted ?

earthboundkid commented 4 years ago

I don't agree that the 24 bytes are wasted. Again, null terminated strings are a security disaster, not to mention the problems with concatenation and accidental O(N) strlen. I think the strong consensus of everyone is that slices are a better general case solution.

In addition, arrays have their length as part of their type, and that information must be stored in the binary, so not all of the bytes are recoverable even in principle.

But, byte arrays do have some uses. As @jfcg pointed out, C/C++ isn't going away any time soon, so there is a need to be able to make an array to pass back and forth to C. But that's what C.CString is for.

I think that jfcg is arguing that in the popular codebases today, byte arrays are being underused because of the difficulty of typing out [...]byte{ 'c', 'h', 'a', 'r' }. I disagree that they are being underused. I think that byte arrays are a special optimization and should only be used when evidence shows that they bring advantages.

I'm neutral on whether [...]byte("") is a good language change. Based on the evidence we've seen so far, it appears to me that the optimization is only rarely needed (8 cases out of 4,076). OTOH, the language change seems small and easy to understand.

bcmills commented 4 years ago

The basis for your proposal is that using byte arrays instead of byte slices is preferred because it saves a significant amount of space.

It's not just the space savings: if you have a protocol that requires a fixed-length byte array, obtaining such an array from a variable-length slice is verbose at best, and turns what ought to be a compile-time length check into a run-time one (or, worse, an undetected under- or over-run, which may be particularly dangerous when populating fields in a struct passed in from C).

jfcg commented 4 years ago

@carlmjohnson, let's take a look at this:

func f(x int) {
    if x > 99 {
        return
    }
    var buf = []byte("file000")

    buf[6] += byte(x%10) // just do something with x & buf
    fmt.Println(buf)

    go f(x+1)
    f(x+2)
}

It is an artificial example, but do you think there is no 24 bytes of waste ? What does that mean for a recursive f() ??

Btw this feature has nothing to do with C strings or their lack of security. What is it that I can do with this feature that I cant do with a byte{ } initialization ? What you are saying is totally irrelevant..

ianlancetaylor commented 4 years ago

The discussion about "24 bytes of waste" is assuming specific compiler behavior. It's misleading because compilers can change. In the example above it's true that buf is likely to be allocated as 24 bytes on the stack in order to get a value to pass to fmt.Println. But it's also true that using a byte array is likely to cause two copies of the array to exist, in order to get a separate copy of the array to pass to fmt.Println. Exactly what happens is going to depend deeply on what the compiler is able to do. If the compiler never needs to store the capacity of a slice anywhere, then perhaps it won't. If the compiler never needs to copy an array, then perhaps it won't.

In general, we should avoid making language decisions based on compiler behavior.

(Which is not to say that I am opposed to this change, but if there is a reason to do it it has to do with making code simpler and more readable, and is not because of saving space on the stack.)

earthboundkid commented 4 years ago

Your example is very artificial and hard to draw conclusions from. In this case, fmt.Println accepts a slice of interface{}, so you're already "wasting" many more bytes on that. But the program probably isn't memory constrained, so there's no reason to optimize it either.

jfcg commented 4 years ago

@carlmjohnson, any fixed-size local buffer that does not escape the function is better as a byte array.

@ianlancetaylor, byte slices are officially seperate types that reference to memory buffers, am I wrong? How can a compiler choose to not allocate space for that slice, can you explain?

ianlancetaylor commented 4 years ago

Both byte slices and byte arrays have an array of bytes. We can disregard that when comparing them.

A byte slice is three different values: a pointer to the array of bytes, a length, and a capacity. A byte array will internally be represented using a pointer to the array, and a constant length. When the compiler sees []byte("abc") it knows that the length and capacity are 3, so they are also constant, just as with a byte array. So internally when the compiler is working with a byte slice initialized in that way, it has a pointer to an array, and a constant length and capacity. In other words, it's just the same as an array.

The 24 bytes only arises when the compiler has to construct the slice in memory. In practice that only happens if the slice is converted to an interface type. When the slice is indexed, the compiler can compare the index with the length, in this case a constant. When the slice is passed to a function, the compiler will pass three separate values, where in this case two of them are constants. The compiler does not need to construct a 24 byte value in order to do that (the arguments to the function will take 24 bytes, but exactly the same would happen when passing a byte array to a function that expects a byte slice, by slicing the array).

In other words, the compiler doesn't think of a byte slice as a single value of size 24 bytes. It thinks of it as three different values, one pointer and two ints. Only when the slice must be stored into memory, as when converting to an interface type or setting a global variable, does the compiler need to assemble the 24 byte value.

jfcg commented 4 years ago

Ok I see, so the compiler is smart for many cases. Do we have the 24-byte problem for the example above for gc toolchain as of today?

mrkanister commented 4 years ago

It's not just the space savings: if you have a protocol that requires a fixed-length byte array, obtaining such an array from a variable-length slice is verbose at best, and turns what ought to be a compile-time length check into a run-time one (or, worse, an undetected under- or over-run, which may be particularly dangerous when populating fields in a struct passed in from C).

@bcmills, you are right, that is another valid use case for byte arrays. I wasn't trying to ignore your comment from a few weeks back, but rather wanted to emphasize that focussing on the (supposedly) saved space might not be the way to gain enough support for the proposal. Sorry, if I didn't make that clear enough.

ianlancetaylor commented 4 years ago

As far as I know the gc compiler doesn't have the "24 byte problem." And, if it does, we should treat as something to fix in the compiler, rather than something to fix in the language.

(As I mentioned above I'm not opposed to this change, I just don't think that optimization is a reason for it.)

jfcg commented 4 years ago

Hi. To better understand slice/array initialization differences, I disassembled slice.go:

package mypkg

//import "fmt"

func slc(x int) {
    if x > 99 {
        return
    }
    var buf = []byte("file000")

    buf[6] += byte(x % 10) // just do something with x & buf
    //  fmt.Println(buf)

    slc(x + 1)
    slc(x + 2)
}

array.go:

package mypkg

//import "fmt"

func arr(x int) {
    if x > 99 {
        return
    }
    var buf = [...]byte{'f', 'i', 'l', 'e', '0', '0', '0'}

    buf[6] += byte(x % 10) // just do something with x & buf
    //  fmt.Println(buf[:])

    arr(x + 1)
    arr(x + 2)
}

with go tool compile -S array.go > array.S etc. Dissamblies are almost the same:

array version has two more Go assembly instructions
array version has "file000" in a readonly data segment
slice version has "file000" in a Go string
slice assembly has no definition of autotmp_2 (string to slice library function?)

I think both of these simplified versions spend same amount of stack as @ianlancetaylor said.

This is the vimdiff shot: slice-arr

Could someone please explain these little differences? Thanks..

josharian commented 4 years ago

I’m on my phone, but I suspect that after https://go-review.googlesource.com/c/go/+/220499 goes in, those differences may disappear. That change description may help you understand the difference as well. If you want to know more, look for array and slice initialization in src/cmd/compile/internal/gc/sinit.go.

ianlancetaylor commented 4 years ago

Thanks for finding the examples where this could be used. I note that many of the cases are one map in x/text/internal/language, which appears twice in the examples. Many of the other cases are only in tests. This doesn't seem to be used often enough to justify changing the language, per the criteria for language changes at https://blog.golang.org/go2-here-we-come.

It's a bit odd to restrict this conversion only to constant strings. But it's also odd to permit non-constant strings, as it's not obvious what should happen if the string is too long.

Given that it only applies to constant strings, it's syntactic sugar for listing out the bytes.

As discussed above, the effects on the compiler should be fixed in the compiler, not in the language.

For these reasons, this is a likely decline. Leaving open for four weeks for final comments.

jfcg commented 4 years ago

This change makes sense not just for constant strings but constant-size strings:
```
func fn(s string) {
if len(s) != 4 {
    return
}
buf := [...]byte(s)
```
On top of the dozens of examples above, there are 4076+ examples of creating slices from simple strings. People are likely always choosing slices in these cases because only slices have a convenient initialization from strings. If only 1% of them could be written with arrays, you have 40+ more examples for this proposal.
This proposal will close an assymmetry in slice and array initialization. It is like being able to initialize float32 from an int constant, but not float64.
It is Go 1 compatible.

If this proposal does not satisfy the condition of inclusion, I dont see what does. This is very wrong. Go is a programming language, not a religion. I dont get it.

ianlancetaylor commented 4 years ago

@jfcg The language can't rely only tests like if len(s) != 4 that might be omitted or incorrect. We have to define exactly what should happen when converting a non-constant string to a byte array type.

The blog post lists three criteria for language changes. One of those is that a language change must "address an important issue for many people." The cases in x/text/internal/language/lookup.go we can effectively treat as a single example, as they are all the same. Ignoring cases in tests, I count 12 examples in https://github.com/golang/go/issues/36890#issuecomment-584833859. That isn't many, especially considering that you looked at the whole standard library. And each of those 12 examples is easy to write today; this language change would make it slightly more convenient, but it wouldn't add a feature that is not already available.

I don't see any particular reason to assume that any cases of converting a string to a slice could instead convert a string to a byte array. Perhaps some could, perhaps not. Slices and arrays are different and serve different purposes.

I don't agree that adding this feature would close an asymmetry in the language. You can't convert an array of bytes to a string.

jfcg commented 4 years ago

@jfcg The language can't rely only tests like if len(s) != 4 that might be omitted or incorrect. We have to define exactly what should happen when converting a non-constant string to a byte array type.

Any string whose size can be determined at compile-time can be converted to a byte array. How is constant-size string not well defined?

The blog post lists three criteria for language changes. One of those is that a language change must "address an important issue for many people." The cases in x/text/internal/language/lookup.go we can effectively treat as a single example, as they are all the same. Ignoring cases in tests, I count 12 examples in #36890 (comment). That isn't many, especially considering that you looked at the whole standard library. And each of those 12 examples is easy to write today; this language change would make it slightly more convenient, but it wouldn't add a feature that is not already available.

Ian, how many slice(string) conversions were there in the global Go code bases before Go team added that ability to Go? How many are there now?

I don't see any particular reason to assume that any cases of converting a string to a slice could instead convert a string to a byte array. Perhaps some could, perhaps not. Slices and arrays are different and serve different purposes.

I don't agree that adding this feature would close an asymmetry in the language. You can't convert an array of bytes to a string.

we can write []byte("input") but not [...]byte("input"). Do you agree? i guess you dont count string(array[:]) as a conversion.

ianlancetaylor commented 4 years ago

Any string whose size can be determined at compile-time can be converted to a byte array. How is constant-size string not well defined?

The Go language is defined by a language spec, not by an implementation. There are multiple Go compilers. If we implement this feature, all compilers must agree on exactly what it is permitted to convert a string to a byte array, and when it is not. Otherwise a program might compile with one compiler but not with another. Therefore, we can't just say "size can be determined at compile-time." We must write down the precise rules by which the size can be determined at compile time.

Ian, how many slice(string) conversions were there in the global Go code bases before Go team added that ability to Go? How many are there now?

I understand what you are asking, but I don't think it's comparable. Before we added the ability to convert a string to a []byte, there was no way to do that conversion. We can already today convert a constant string to a byte array, by writing down the bytes one by one, as in the examples you listed above. I don't see an obvious reason to believe that people would write more conversions of constant strings to byte arrays if they could do the conversion directly, rather than by doing what they can already do today.

we can write []byte("input") but not [...]byte("input"). Do you agree?

Yes.

i guess you dont count string(array[:]) as a conversion.

That is one slice expression and one conversion.

mrkanister commented 4 years ago

On top of the dozens of examples above, there are 4076+ examples of creating slices from simple strings. People are likely always choosing slices in these cases because only slices have a convenient initialization from strings. If only 1% of them could be written with arrays, you have 40+ more examples for this proposal.

Just curious, why would you want to rewrite them to use arrays when they currently work fine with slices?

gopherbot commented 4 years ago

Change https://golang.org/cl/227163 mentions this issue: cmd/compile,runtime: pass only ptr and len to some runtime calls

tv42 commented 4 years ago

@bcmills https://github.com/golang/go/issues/36890#issuecomment-580423877

I have, on occasion, wanted a byte-array from a constant string in order to have something addressable to pass to a C function (https://play.golang.org/p/vTuzoWnwRhA).

&slice[0] works just fine. https://play.golang.org/p/3mtOk5XJWJB

golang / go

proposal: Go 2: permit converting a string constant to a byte array type #36890