duckdb / duckdb_spatial

MIT License
484 stars 39 forks source link

Specify format to ST_READ when feeding data via stdin? #117

Open marklit opened 1 year ago

marklit commented 1 year ago

Can ST_READ be given a file format or can that only be inferred from a filename? I was hoping something like the following would work but I'm getting ERROR 4: `/dev/stdin' not recognized as a supported file format.

$ unzip -p test.zip land.gpkg \
    | ~/duckdb_spatial/build/release/duckdb \
        -unsigned \
        -json \
        -c "SELECT * EXCLUDE(geom),
                   ST_AsHEXWKB(
                       ST_GeomFromWkb(geom))::TEXT AS geom,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 7)::bigint) as h3_7,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 8)::bigint) as h3_8,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 9)::bigint) as h3_9
            FROM st_read('/dev/stdin')" \
    | jq -S .
Maxxen commented 1 year ago

There is a "allowed_drivers=[varchar]" option, but I think youre looking to use the stdin virtual filesystem "/vsistdin/"

marklit commented 1 year ago

Any idea what the syntax might look like?

$ unzip -p test.zip land.gpkg \
    | ~/duckdb_spatial/build/release/duckdb \
        -unsigned \
        -json \
        -c "SELECT * EXCLUDE(geom),
                   ST_AsHEXWKB(
                       ST_GeomFromWkb(geom))::TEXT AS geom,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 7)::bigint) as h3_7,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 8)::bigint) as h3_8,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 9)::bigint) as h3_9
            FROM st_read('/dev/stdin', allowed_drivers='[gpkg]')" \
    | jq -S .
Warning 1: File /dev/stdin has GPKG application_id, but non conformant file extension
ERROR 4: sqlite3_open(/dev/stdin) failed: unable to open database file
Error: IO Error: Could not open file: /dev/stdin (sqlite3_open(/dev/stdin) failed: unable to open database file)
$ unzip -p test.zip land.gpkg \
    | ~/duckdb_spatial/build/release/duckdb \
        -unsigned \
        -json \
        -c "SELECT * EXCLUDE(geom),
                   ST_AsHEXWKB(
                       ST_GeomFromWkb(geom))::TEXT AS geom,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 7)::bigint) as h3_7,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 8)::bigint) as h3_8,
                   printf('%x', h3_latlng_to_cell(
                                 ST_Y(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 ST_X(ST_CENTROID(ST_GeomFromWkb(geom))),
                                 9)::bigint) as h3_9
            FROM st_read('/vsizip/{/vsistdin?buffer_limit=-1}/test.gpkg')" \
    | jq -S .
ERROR 4: `/vsizip/{/vsistdin?buffer_limit=-1}/test.gpkg' does not exist in the file system, and is not recognized as a supported dataset name.
Error: IO Error: Could not open file: /vsizip/{/vsistdin?buffer_limit=-1}/test.gpkg (`/vsizip/{/vsistdin?buffer_limit=-1}/test.gpkg' does not exist in the file system, and is not recognized as a supported dataset name.)
mtravis commented 1 year ago

@marklit just taking a look at this one and seen that your output from the zip is land.gpkg but the st_read uses test.gpkg

marklit commented 9 months ago

I built a new test to make sure the filenames lined up but I'm still seeing the same issue.

$ unzip -l test.zip 
Archive:  test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
   200704  02-10-2024 10:04   test.gpkg
---------                     -------
   200704                     1 file
$ unzip -p test.zip test.gpkg | hexdump -C | head
00000000  53 51 4c 69 74 65 20 66  6f 72 6d 61 74 20 33 00  |SQLite format 3.|
00000010  10 00 01 01 00 40 20 20  00 00 00 0d 00 00 00 31  |.....@  .......1|
00000020  00 00 00 00 00 00 00 00  00 00 00 1e 00 00 00 04  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 01 00 00 27 d8  |..............'.|
00000040  00 00 00 00 47 50 4b 47  00 00 00 00 00 00 00 00  |....GPKG........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 0d  |................|
00000060  00 2e 4f 7a 05 00 00 00  02 0f f6 00 00 00 00 31  |..Oz...........1|
00000070  0f fb 0f f6 0c 01 0c 9a  09 68 0b b6 0b 6b 07 7e  |.........h...k.~|
00000080  09 1f 05 77 07 3d 04 38  02 eb 01 a4 00 00 00 00  |...w.=.8........|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
$ unzip -p test.zip test.gpkg \
    | duckdb \
        -json \
        -c "SELECT ST_AsHEXWKB(
                       ST_GeomFromWkb(geom))::TEXT AS geom
            FROM st_read('/vsizip/{/vsistdin?buffer_limit=-1}/test.gpkg')"
Error: IO Error: GDAL Error (4): `/vsizip/{/vsistdin?buffer_limit=-1}/test.gpkg' does not exist in the file system, and is not recognized as a supported dataset name.