duckdb / duckdb_spatial

MIT License
492 stars 41 forks source link

BUG: Seg Fault when `st_aswkb()` to arrow #236

Closed ncclementi closed 7 months ago

ncclementi commented 10 months ago

Minimal reproducible example duckdb version: 0.9.2 OS: MacOS Sonoma 14.2.1 - M2 arm64

I ran this script on lldb, and I had to attach it to a pyarrow 12 to (same reason of than https://github.com/duckdb/duckdb/issues/9371#issuecomment-1766409004)

import duckdb
import gc

for i in range(20):
        sql_str = "select st_aswkb(geom) from foo"
        s = "load spatial; load httpfs"
        view_str = "create or replace temporary view foo as select * from st_read('https://raw.githubusercontent.com/ibis-project/testing-data/master/geojson/zones.geojson')"

        con = duckdb.connect()
        con.execute(s)
        con.execute(view_str)
        v = con.sql(sql_str)

        we = v.arrow()
        gc.collect()
        print(i)

run on lldb

>>> lldb python repro-geo-segfault.py
(lldb) run
0
1
2
3
4
5
Process 6324 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x310a15d12340)
    frame #0: 0x000000010aa25438 spatial.duckdb_extension`std::__1::__tree<long long, std::__1::less<long long>, std::__1::allocator<long long>>::destroy(std::__1::__tree_node<long long, void*>*) + 24
spatial.duckdb_extension`std::__1::__tree<long long, std::__1::less<long long>, std::__1::allocator<long long>>::destroy:
->  0x10aa25438 <+24>: ldr    x1, [x1]
    0x10aa2543c <+28>: bl     0x10aa25420               ; <+0>
    0x10aa25440 <+32>: ldr    x1, [x19, #0x8]
    0x10aa25444 <+36>: mov    x0, x20
Target 0: (python) stopped.
Maxxen commented 10 months ago

Hi! I am unable to reproduce this in the CLI and at first glance I don't think this is related to the spatial code. DuckDB has no knowledge of what the blob contains when it converts it into pyarrow so I'd assume that the crash is due to a bug in the general duckdb blob->pyarrow binary conversion code, but I'll have to investigate more to say for sure.

ncclementi commented 9 months ago

@Maxxen any chance you had some time to look at this?

I noticed that when I'm on duckdb 0.9.1 I don't hit this problem. It also seems to be only happening on mac M2 arm64 (also in Sonoma 14.3), none of the folks with linux machines have been able to reproduce. But I can consistently reproduce it.

I tried also duckdb = 0.9.3.dev3920 and hit the same.

Maxxen commented 9 months ago

Hi! Me and @Tishj sat down and looked through this and we think we've figured out what the issue is. Thankfully it seems to originate in spatial (related to state kept around in the st_read table function binding) and not DuckDB core so even if I wont have time to fix this until after next DuckDB release we should be able to squash it In the following stable spatial version.

ncclementi commented 9 months ago

Thank you @Maxxen and @Tishj

Maxxen commented 9 months ago

I think this may have been fixed in #271, could you try with v0.10.0 and the latest nightly build (I just merged so its gonna be an hour or two until it gets uploaded) and see if it still occurs? If its still occurring I'll sit down with Thijs again and do a proper debugging session.

ncclementi commented 9 months ago

Max, I'll give this a try tomorrow and I'll get back to you.

ncclementi commented 9 months ago

@Maxxen I run the reproducer with duckdb v0.10.0 and installing the spatial extension from the nightly and did not seg fault 🎉

What's the release cycle for the spatial extension, so we can update couple of things in our end? Currently this issue breaks some of our geospatial blogs rendering.

cpcloud commented 8 months ago

@Maxxen Any idea when the next release of this extension will ship?

I'm hitting this segfault pretty regularly now.

Maxxen commented 8 months ago

@cpcloud Yes! Next DuckDB release is scheduled for next week, and the latest spatial extension will be distributed alongside it.

Maxxen commented 8 months ago

Re: Release cycle. DuckDB extensions don't really support versioning yet so the release cycle of our extensions is more or less tied to DuckDB's own release cycle. Previously when DuckDB itself had longer releases cycles (like the almost 6 months between 0.8 and 0.9) we basically "overwrote" extensions for the latest stable version manually every now and then with a later build to push bugfixes and/or new features, but I think now that we are trying to do shorter bug-fix release cycles for DuckDB (1-2 months) we won't be doing that anymore.

ncclementi commented 7 months ago

Closing this one as it is resolved in the recent release. Thanks for your work @Maxxen