duckdb / duckdb_spatial

MIT License
492 stars 41 forks source link

Export to excel fails with `GDAL Error (1): File extension should be XLSX` when file exists. #231

Closed michael-simons closed 9 months ago

michael-simons commented 10 months ago
INSTALL spatial;
LOAD spatial;

create table tbl (x varchar(200));
insert into tbl values('hallo');
-- straight from the docs
COPY tbl TO 'output.xlsx' WITH (FORMAT GDAL, DRIVER 'xlsx');

Error: IO Error: GDAL Error (1): File extension should be XLSX

duckdb -version
v0.9.2 3c695d7ba9

macOS, 14.1.1 (23B81), m1
michael-simons commented 10 months ago

Hmm, that seems to be flaky… Sometimes it works on a fresh instance, sometimes it does not.

michael-simons commented 10 months ago

Ok, this is the error message when the file already exists… It is stated in the docs that the file must not exists, but you would never assume that to be the cause reading the message.

Changed the title of this issue.

Maxxen commented 10 months ago

Hi! Thanks for filing this issue. Seems like this is an underlying issue with the GDAL driver, or perhaps with how it interacts with our filesystem abstraction. It may very well be possible that we can improve this error message somehow.

Maxxen commented 10 months ago

So I have spent quite some time looking into this. The problem is that when DuckDB detects that an output file already exists it by default writer the output to a separate file suffixed with .tmp before doing a swap at the end. This is to avoid wrecking the output file if the write would fail halfway.

The problem is that some GDAL drivers (like xlsx) checks that the file they want to write ends with a their extension before they start writing anything as an extra failsafe. But this doesn't work when writing the tempfile as that changes the extension from e.g. .xlsx to .xlsx.tmp, hence the error.

I think the best solution here is to change it in DuckDB core so that we write temp-files with the original extension, but this can get a little bit hairy as we need to do quite a bit of path manipulation and detection. But I'll look into it and reference this issue once I got a PR up.

michael-simons commented 10 months ago

Thanks for the detailed and understandable write-up. I can relate and agree.