apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.3k stars 3.48k forks source link

[Go][Parquet] panic: fatal error: found bad pointer in Go heap on ByteArrayColumnChunkReader.Skip #33108

Open asfimport opened 1 year ago

asfimport commented 1 year ago

runtime: pointer 0xc001470001 to unused region of span span.base()=0xc000e78000 span.limit=0xc000e7a000 span.state=1
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw({0x15ac9e9?, 0x3?})
    /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00010fee0 sp=0xc00010feb0 pc=0x438351
runtime.badPointer(0x7f16917f4d70, 0xc00010ff80?, 0x0, 0xc0006ef1e0?)
    /usr/local/go/src/runtime/mbitmap.go:368 +0x150 fp=0xc00010ff30 sp=0xc00010fee0 pc=0x414490
runtime.findObject(0xc000193a00?, 0xc0006ef1e0?, 0xc00010ffb0?)
    /usr/local/go/src/runtime/mbitmap.go:410 +0xa6 fp=0xc00010ff68 sp=0xc00010ff30 pc=0x414626
runtime.wbBufFlush1(0xc000066000)
    /usr/local/go/src/runtime/mwbbuf.go:260 +0xe5 fp=0xc00010ffb0 sp=0xc00010ff68 pc=0x433525
runtime.wbBufFlush.func1()
    /usr/local/go/src/runtime/mwbbuf.go:201 +0x25 fp=0xc00010ffc8 sp=0xc00010ffb0 pc=0x433365
runtime.systemstack()
    /usr/local/go/src/runtime/asm_amd64.s:469 +0x49 fp=0xc00010ffd0 sp=0xc00010ffc8 pc=0x468ea9

goroutine 27 [running]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:436 fp=0xc0007790c0 sp=0xc0007790b8 pc=0x468e40
runtime.wbBufFlush(0x0?, 0x0?)
    /usr/local/go/src/runtime/mwbbuf.go:200 +0x6c fp=0xc0007790e0 sp=0xc0007790c0 pc=0x4333ec
runtime.wbBufFlush(0xc00091e190, 0xc0014870eb)
    <autogenerated>:1 +0x2a fp=0xc000779100 sp=0xc0007790e0 pc=0x46d3aa
runtime.gcWriteBarrier()
    /usr/local/go/src/runtime/asm_amd64.s:1669 +0xa3 fp=0xc000779180 sp=0xc000779100 pc=0x46b003
runtime.gcWriteBarrierDX()
    /usr/local/go/src/runtime/asm_amd64.s:1697 +0x7 fp=0xc000779188 sp=0xc000779180 pc=0x46b087
github.com/apache/arrow/go/parquet/internal/encoding.(*ByteArrayDictConverter).Copy(0xc00091ca80?, {0x1252500?, 0xc0006c6000?}, {0xc0005e7000?, 0x400?, 0x0?})
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/internal/encoding/typed_encoder.gen.go:1344 +0xa7 fp=0xc0007791b0 sp=0xc000779188 pc=0xbde387
github.com/apache/arrow/go/parquet/internal/utils.(*RleDecoder).GetBatchWithDictByteArray(0xc000276500, {0x1766bf8, 0xc000505140}, {0xc00091ca80, 0x229, 0x400})
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/internal/utils/typed_rle_dict.gen.go:1168 +0x1d0 fp=0xc000779238 sp=0xc0007791b0 pc=0xbac590
github.com/apache/arrow/go/parquet/internal/utils.(*RleDecoder).GetBatchWithDict(0x40a4cd?, {0x1766bf8?, 0xc000505140?}, {0x1252500?, 0xc0007792e0?})
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/internal/utils/rle.go:422 +0x1de fp=0xc000779290 sp=0xc000779238 pc=0xba4bbe
github.com/apache/arrow/go/parquet/internal/encoding.(*dictDecoder).decode(...)
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/internal/encoding/decoder.go:140
github.com/apache/arrow/go/parquet/internal/encoding.(*DictByteArrayDecoder).Decode(0xc00114b620?, {0xc00091ca80?, 0xc000276500?, 0x41fc10?})
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/internal/encoding/typed_encoder.gen.go:1250 +0x6d fp=0xc000779308 sp=0xc000779290 pc=0xbdd9cd
github.com/apache/arrow/go/parquet/file.(*ByteArrayColumnChunkReader).ReadBatch.func1(0x0, 0x229)
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/file/column_reader_types.gen.go:263 +0xa6 fp=0xc000779350 sp=0xc000779308 pc=0xbf0e86
github.com/apache/arrow/go/parquet/file.(*columnChunkReader).readBatch(0xc000e4aa00, 0x229, {0xc00091ca80, 0x3000, 0x3000}, {0xc00091ca80, 0x3000, 0x3000}, 0xc000779440)
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/file/column_reader.go:485 +0xcd fp=0xc0007793f8 sp=0xc000779350 pc=0xbeec4d
github.com/apache/arrow/go/parquet/file.(*ByteArrayColumnChunkReader).ReadBatch(0x0?, 0x7f16b85f9108?, {0xc00091ca80?, 0xc000580000?, 0x203000?}, {0xc00091ca80?, 0xc000779538?, 0x46b107?}, {0xc00091ca80, 0x3000, ...})
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/file/column_reader_types.gen.go:262 +0x87 fp=0xc000779478 sp=0xc0007793f8 pc=0xbf0d67
github.com/apache/arrow/go/parquet/file.(*ByteArrayColumnChunkReader).Skip.func1(0xc001054b40?, {0xc00091ca80?, 0x0?, 0x0?})
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/file/column_reader_types.gen.go:244 +0x1f8 fp=0xc000779570 sp=0xc000779478 pc=0xbf0c78
github.com/apache/arrow/go/parquet/file.(*columnChunkReader).skipValues(0xc000e4aa00, 0x629, 0xc000779608)
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/file/column_reader.go:442 +0x2a5 fp=0xc0007795f0 sp=0xc000779570 pc=0xbeeac5
github.com/apache/arrow/go/parquet/file.(*ByteArrayColumnChunkReader).Skip(0x10?, 0x176e738?)
    /go/pkg/mod/github.com/apache/arrow/go/parquet@v0.0.0-20211112161151-bc219186db40/file/column_reader_types.gen.go:242 +0x37 fp=0xc000779628 sp=0xc0007795f0 pc=0xbf0a57 

I occasionally (but frequently) hit the above, un-recoverable panic when parsing parquet files.

It does not look like this has anything to do with the parquet file itself, as repeatedly parsing the same file works out fine.

 

Using github.com/apache/arrow/go/parquet v0.0.0-20211112161151-bc219186db40

 

Reporter: Ben

Note: This issue was originally created as ARROW-17896. Please see the migration documentation for further details.

asfimport commented 1 year ago

Matthew Topol / @zeroshade: [~brupp] Can you try upgrading to the newer versions of the library such as github.com/apache/arrow/go/v9/parquet instead of the old version you're using and see if you can reproduce the issue?

I'm fairly certain you're hitting a bug that was already addressed and fixed in one of the earlier versions.

jo-me commented 7 months ago

Hi, I stumbled over this issue in the latest v15 on Ubuntu 22.04 and Go 1.21.7. It happening when parsing a parquet file RowGroup by RowGroup using ColumnChunkReaders and skipping many entries of a ByteArrayColumnChunkReader.

runtime: pointer 0xc039980001 to unallocated span span.base()=0xc020824000 span.limit=0xc020826000 span.state=0

image

Are there any workarounds or things I can try?

Jochen Mehlhorn jochen.mehlhorn@mercedes-benz.com, Mercedes-Benz Tech Innovation GmbH

Provider Information

jo-me commented 7 months ago

It happens sometimes sooner and sometimes later and not always on the Skip function. Sometimes it works for 2 or 3 parquet files and then crashes on the next one - even when parsing them sequentially.

I can also reproduce this on the v16 from main (github.com/apache/arrow/go/v16 v16.0.0-20240219172129-977e217adf4e)