duckdb / duckdb_spatial

MIT License
492 stars 41 forks source link

COPY with format gdal hangs if outputting to stdout #284

Open bdon opened 8 months ago

bdon commented 8 months ago

Adapting the example at Reading from stdin and Writing to stdout

my input is test.csv:

woot
42
43

I run a non-spatial example that outputs to stdout:

duckdb :memory: "COPY (SELECT * FROM read_csv('test.csv')) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)"
woot
42
43

I run a spatial example that outputs to a plain file:

duckdb :memory: "load spatial; COPY (SELECT * FROM read_csv('test.csv')) TO 'test.geojsonseq' WITH (FORMAT 'gdal', DRIVER 'geojsonseq')"
cat test.geojsonseq
{ "type": "Feature", "properties": { "woot": 42 }, "geometry": null }
{ "type": "Feature", "properties": { "woot": 43 }, "geometry": null }

However if I change the spatial output to stdout it hangs forever:

duckdb :memory: "load spatial; COPY (SELECT * FROM read_csv('test.csv')) TO '/dev/stdout' WITH (FORMAT 'gdal', DRIVER 'geojsonseq')"

Is there a configuration setting to the GDAL extension I'm missing? Thanks!

Maxxen commented 8 months ago

Hi! Thanks for reporting this issue! Ill try to have a look soon, but for now I think you can work around this using the csv output in combination with ST_AsGeoJSON and some elbow grease to format the rest of the feature, perhaps the json extension could help with that as well

bdon commented 8 months ago

I worked around it using the JSON output, with CSV output I was running into problems with escaping:

(adapted from https://github.com/bdon/overture-tiles):

LOAD spatial;

COPY ( SELECT
    'Feature' as type,
    json(st_asgeojson(st_geomfromwkb(geometry))) as geometry,
    json_object(
        'confidence',round(confidence,2),
        'category_main',json_extract_string(categories, '$.main'),
        'name',json_extract_string(names, '$.common[0].value')
    ) as properties from read_parquet('overture/theme=places/type=place/*')) TO STDOUT (FORMAT json);

the json(st_asgeojson( part is a little awkward, that might be unavoidable.

Maxxen commented 8 months ago

Ok I've partly figured out the issue - the issue is that GDAL doesn't understand that this is chardev file when going through DuckDBs filesystem and asks to create a new file which DuckDB then errors on (only when I attach the debugger though... otherwise it hangs). The problem is that we can't figure out the file type in DuckDB without opening it first, which we can't do with the access flags we get from GDAL. We need some way to strip the CREATE_NEW access flag if requested on a chardev file before we actually open it. I need to think a bit about how to best solve this.

But as of v0.10.1 I've reenabled GDALs /vsi prefix support, so another workaround for now is to run:

COPY (SELECT * FROM read_csv('test.csv')) TO '/vsistdout/' WITH (FORMAT 'gdal', DRIVER 'geojsonseq');
bdon commented 8 months ago

On v0.10.2-dev39 e245e2927f (macOS)

duckdb :memory: "load spatial; COPY (SELECT * FROM read_csv('test.csv')) TO '/vsistdout/' WITH (FORMAT 'gdal', DRIVER 'geojsonseq')"
Not implemented Error: GDAL Error (6): Read or update mode not supported on /vsistdout