golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.68k stars 17.49k forks source link

proposal: bytes: separate the package bytes into the Unicode-related part and the others #40782

Closed hajimehoshi closed 4 years ago

hajimehoshi commented 4 years ago

The problem of unicode is that this tends to bloat the binary size [1]. Now the package bytes imports the package unicode. Even if you want to use just byte.Buffer, you have to import unicode indirectly, which is unfortunate. Would it be possible to separate bytes package into APIs that are related to Unicode and the others, probably in Go2?

[1] For example: https://github.com/hajimehoshi/ebiten/issues/1157#issuecomment-673613666: unicode.init takes 22359 bytes IIUC

davecheney commented 4 years ago

@hajimehoshi could you bisect to figure out when the dependency between bytes and unicode was added. Thank you.

hajimehoshi commented 4 years ago

https://github.com/golang/go/commit/2f5e75859b8bcb1ad3b8a8d3c4db078ecc5a6158 before Go1.0?

davecheney commented 4 years ago

Thanks for confirming.

martisch commented 4 years ago

If the goal is to reduce binary size then I dont think just splitting bytes is worth the churn in either Go 1 or Go 2.

Many other packages strings, fmt, reflect will still require unicode to be imported making the number of go programs after the split not importing Unicode very small still.

The ergonomics when programming of having to reason in which package to find the corresponding bytes function that previously were all together in one package is I think not worth the 22k byte saved.

To tackle this problem for all packages I think (apart from avoiding adding unicode dependencies were possible) is to work on compiler and linker optimisations making the imported parts of unicode smaller. e.g. https://github.com/golang/go/issues/38784 This also has the advantage of not requiring Go 2 incompatibilities.

hajimehoshi commented 4 years ago

Many other packages strings, fmt, reflect will still require unicode to be imported making the number of go programs after the split not importing Unicode very small still.

Fair enough. Focusing on reducing the binary size of unicode itself makes much more sense.