daulet / tokenizers

Go bindings for HuggingFace Tokenizer
MIT License
92 stars 23 forks source link

tokenizer.go:190:10: type [1073741824]*_Ctype_char larger than address space #25

Closed leonardyp closed 4 weeks ago

leonardyp commented 3 months ago
Leonard@PA202205M2 MINGW64 /d/work/src/demo
$ go mod tidy
go: downloading github.com/daulet/tokenizers v0.8.0

Leonard@PA202205M2 MINGW64 /d/work/src/demo
$ go build -v .
github.com/daulet/tokenizers
# github.com/daulet/tokenizers
D:/software/gopath/pkg/mod/github.com/daulet/tokenizers@v0.8.0/tokenizer.go:190:10: type [1073741824]*_Ctype_char larger than address space
D:/software/gopath/pkg/mod/github.com/daulet/tokenizers@v0.8.0/tokenizer.go:190:10: type [1073741824]*_Ctype_char too large
D:/software/gopath/pkg/mod/github.com/daulet/tokenizers@v0.8.0/tokenizer.go:123:10: type [1073741824]*_Ctype_char larger than address space
D:/software/gopath/pkg/mod/github.com/daulet/tokenizers@v0.8.0/tokenizer.go:123:10: type [1073741824]*_Ctype_char too large

Leonard@PA202205M2 MINGW64 /d/work/src/demo
$ go env
set GO111MODULE=on
set GOARCH=386
set GOBIN=
set GOCACHE=C:\Users\pact\AppData\Local\go-build
set GOENV=C:\Users\pact\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=386
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=D:\software\gopath\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=D:\software\gopath;D:\work
set GOPRIVATE=
set GOPROXY=https://goproxy.cn,direct
set GOROOT=D:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLCHAIN=auto
set GOTOOLDIR=D:\go\pkg\tool\windows_386
set GOVCS=
set GOVERSION=go1.22.3
set GCCGO=gccgo
set GO386=sse2
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=D:\work\src\demo\go.mod
set GOWORK=
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m32 -mthreads -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=C:\Users\pact\AppData\Local\Temp\go-build3041588043=/tmp/go-build -gno-record-gcc-switches
daulet commented 3 months ago

some kind of overflow, can you share your demo? Or how big input is

leonardyp commented 3 months ago

some kind of overflow, can you share your demo? Or how big input is

I'm just using the sample code:

https://github.com/daulet/tokenizers/blob/main/example/main.go

daulet commented 3 months ago

that example is using v0.9.0, your output indicates installing v0.8.0. Try latest release? Also you can always build the rust piece from source (make build) so there is no dependency discrepancy