golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.17k stars 17.37k forks source link

regexp: Invalid character class on Windows ARM #68268

Open anacrolix opened 5 days ago

anacrolix commented 5 days ago

Go version

master and 1.22 (and probably other versions)

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/anacrolix/Library/Caches/go-build'
GOENV='/Users/anacrolix/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/anacrolix/go/pkg/mod'
GOOS='darwin'
GOPATH='/Users/anacrolix/go'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.3/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.3/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/m4/f67w9zfx1pl386_2yjs7xtkm0000gn/T/go-build3366901238=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I'm using github.com/google/go-cmp on Windows ARM. It was compiled for amd64. This is a playground for it, but obviously it's not cross compiled: https://go.dev/play/p/8a4Kt1DD-al?v=gotip.

What did you see happen?

A panic during init. The init code looks fine. The binary also appears to be loading fine, it just doesn't like this character class.

panic: regexp: Compile(`[_\p{L}][_\p{L}\p{N}]*$`): error parsing regexp: invalid character class range: `\p{L}`

goroutine 1 [running]:
regexp.MustCompile({0x7ff66897b52c, 0x17})
    /opt/homebrew/Cellar/go/1.22.3/libexec/src/regexp/regexp.go:317 +0xb4
github.com/google/go-cmp/cmp/internal/function.init()
    /Users/anacrolix/go/pkg/mod/github.com/google/go-cmp@v0.6.0/cmp/internal/function/func.go:68 +0x7c

What did you expect to see?

go-cmp's init to work normally and main to start.

anacrolix commented 5 days ago

I reported this downstream https://github.com/google/go-cmp/issues/359, but I believe the issue is in go.

gabyhelp commented 5 days ago

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

anacrolix commented 5 days ago

None of the related issues match this problem.

thanm commented 4 days ago

Hi @anacrolix I am having trouble reproducing this problem -- when I build + run your test case on our windows/arm64 builders, it seems to work just fine. I tried both with tip of master and with Go 1.22.4.

Your go env output shows a Darwin machine, just confirming that this is indeed a windows issue and not a Darwin issue? Might help if you could tell me a bit more about the setup/configuration of the windows machine where you're seeing the problem. Thanks.

anacrolix commented 4 days ago

Thanks for looking. Specifically I cross compiled from darwin to windows/amd64, then ran that binary on Windows 11 on windows/arm64.

thanm commented 4 days ago

Still can't reproduce. From my darwin arm64 machine I built the program you listed via

GOOS=windows GOARCH=amd64 go build prog.go

downloaded the resulting executable to a windows/arm64 builder, still works fine. What am I missing?

mauri870 commented 4 days ago

I couldn't reproduce as well. This feels like the error you would get by copying the regexp from the internet and some character in the regexp string would have a different encoding. But given this happens with a library upstream I'd say it is probably something to do with the OS.

anacrolix commented 4 days ago

Could it be CGO related? I am linking in a bunch of other stuff, and using a cross compiler (x86_64-unknown-linux-gnu-gcc) for some of those cgo parts.

At a stretch, could it be triggered by some kind of memory or emultation corruption? Are your windows/arm64 instances emulated (mine is).

The (downstream) library you mention is heavily used, AFAICT the character classes it's using are well defined too.

Let me know what else I can provide? Thanks

thanm commented 3 days ago

CGO: sure, I can certainly imagine problems there, but it depends on the code of course.

Are your windows/arm64 instances emulated (mine is).

The Go team windows/arm64 builders are Azure VMs, and I am pretty sure they are backed by actual arm hardware.

anacrolix commented 3 days ago

I'll try to construct a minimal reproduction. Thanks for looking into it

anacrolix commented 3 days ago

I'm only able to reproduce with upx involved. I've mentioned an issue in upx that looks related. Obviously this could be out of the scope of Go, however I'm able to reproduce it from a part of my code that additionally seems to incorrectly depend on go-cmp.

I've created a repo to reproduce at https://github.com/anacrolix/go-issue-68268. It contains some code guarded by conditions, and additionally that can be commented out to avoid compiling in the dependencies. The behaviour varies significantly if you mess around with it. The justfile contains some build steps, and copies to my Windows ARM VM.

If I use my package github.com/anacrolix/torrent/bencode, a dependency is generated to github.com/google/go-cmp/cmp via github.com/anacrolix/torrent/bencode.test (Comment out all the other lines to see this behaviour). When the resulting binary is run through upx, something must be stripped out. It's not clear why this code is included in the first place however.

> go mod why -m github.com/google/go-cmp
# github.com/google/go-cmp
github.com/anacrolix/go-issue-68268
github.com/anacrolix/torrent/bencode
github.com/anacrolix/torrent/bencode.test
github.com/google/go-cmp/cmp

CGO is not related at all. I don't have real ARM hardware running Windows, but I also think that's not the problem either.

jreiser commented 2 days ago

@anacrolix The best way to help make progress on the upx side is to "Paste, drop, or click to add files" (or give the URL) of an actual (never compressed yet) executable file which runs, but which fails when compressed by upx. (ref: https://github.com/upx/upx/issues/443) So far I cannot find such a file here.

anacrolix commented 2 days ago

@jreiser I thought it would be preferable to build it yourself from source (per the repo). Here's the exe that runs fine before upx. fine-before-upx.zip

jreiser commented 1 day ago

@anacrolix Thank you for the file. An actual executable file is a cut point in the graph of debugging possibilities, so it is the first place to look. Repo, integrated development environment (options, menu settings, environment variables, compiler/linker, Python site packages integrations, etc.), and system or local static libraries often introduce dozens of variables. Variations in the OS (Windows 11, latest 10, early 10, 7) sometimes present their own problems, but having a specific actual executable file still helps.