duckdb / duckdb_spatial

MIT License
474 stars 35 forks source link

KML parsing fails as of DuckDB v1.1.2 #431

Open marcoslot opened 1 week ago

marcoslot commented 1 week ago

Not sure whether this is a GDAL issue or duckdb_spatial issue, but KML parsing sometimes fails in DuckDB v1.1.2, worked in v1.1.1

v1.1.2:

copy (SELECT
            'hello-'||generate_series AS name,
            'world-'||generate_series AS desc,
            format('POINT(52.{} 4.{})', generate_series, generate_series)::geometry AS geom
          FROM generate_series(1,100)) to 'test.kml' with (format 'GDAL', driver 'KML');

select count(*) from st_read('test.kml');
IO Error: GDAL Error (1): XML parsing of KML file failed : unclosed token at line 278, column 2

select count(*) from st_read('test.kml');
IO Error: GDAL Error (1): GDALOpen() called on test.kml recursively

v1.1.1:

select count(*) from st_read('test.kml');
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│          100 │
└──────────────┘
Maxxen commented 1 week ago

Hi! Thanks for opening this issue! Im unable to reproduce the error using the code you provided, both when building duckdb from source and when using the one provided by brew. What platform are you on?

maxxen@Maxs-MacBook-Pro-2 duckdb_spatial % duckdb
v1.1.2 f680b7d08f
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D load spatial;
D copy (SELECT
              'hello-'||generate_series AS name,
              'world-'||generate_series AS desc,
              format('POINT(52.{} 4.{})', generate_series, generate_series)::geometry AS geom
            FROM generate_series(1,100)) to 'test.kml' with (format 'GDAL', driver 'KML');
D
D select count(*) from st_read('test.kml');
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│          100 │
└──────────────┘
marcoslot commented 1 week ago

I'm using Ubuntu 22.04 on x86_64 using DuckDB CLI v1.1.2

$ rm ~/.duckdb/extensions/v1.1.2/linux_amd64/*
$ duckdb
v1.1.2 f680b7d08f
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D install spatial; load spatial;
D copy (SELECT
              'hello-'||generate_series AS name,
              'world-'||generate_series AS desc,
              format('POINT(52.{} 4.{})', generate_series, generate_series)::geometry AS geom
            FROM generate_series(1,100)) to 'test.kml' with (format 'GDAL', driver 'KML');
D select count(*) from st_read('test.kml');
IO Error: GDAL Error (1): XML parsing of KML file failed : no element found at line 279, column 0
D 

The behaviour appears to be size related:

D copy (SELECT
              'hello-'||generate_series AS name,
              'world-'||generate_series AS desc,
              format('POINT(52.{} 4.{})', generate_series, generate_series)::geometry AS geom
            FROM generate_series(1,53)) to 'test.kml' with (format 'GDAL', driver 'KML');
D select count(*) from st_read('test.kml');
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│           53 │
└──────────────┘
D copy (SELECT
              'hello-'||generate_series AS name,
              'world-'||generate_series AS desc,
              format('POINT(52.{} 4.{})', generate_series, generate_series)::geometry AS geom
            FROM generate_series(1,54)) to 'test.kml' with (format 'GDAL', driver 'KML');
D select count(*) from st_read('test.kml');
IO Error: GDAL Error (1): XML parsing of KML file failed : unclosed token at line 279, column 0

With 53 rows the file is 8058 bytes, and with 54 rows it is 8206 bytes. Presumably ~8192 is the boundary.

I tried building DuckDB from source and downloading from https://github.com/duckdb/duckdb/releases/download/v1.1.2/duckdb_cli-linux-amd64.zip and both give the same error.