duckdb / duckdb_spatial

MIT License
493 stars 41 forks source link

Unable to force headers in empty excel #258

Closed rstabler11 closed 9 months ago

rstabler11 commented 9 months ago

What happens?

I am unable to force headers when importing from excel if the spreadsheet only contains one row (the header row). In this case when in duckdb the headers are "Field 1, Field 2 etc" and my actual headers appear as the first row of data.

This has been reported previously eg https://github.com/duckdb/duckdb/issues/8531 but it still does not work for me. I have tried on duckdb 0.9.1, 0.9.2, 0.10.0 and no update so far has resolved it.

To Reproduce

This is my code in python 3.11:

import duckdb
conn = duckdb.connect(':memory:')
conn.execute("LOAD Spatial; INSTALL Spatial;")
conn.execute(f"""CREATE TABLE raw AS SELECT * FROM st_read("Test_file.xlsx", layer = "Test_sheet", open_options=['HEADERS=FORCE'])""")
conn.execute("SELECT * FROM raw")
print(conn.fetch_df())

Returns:

          Field1         Field2
0  TEST_HEADER_1  TEST_HEADER_2

However, if I add some data to the excel it does work

  TEST_HEADER_1 TEST_HEADER_2
0  Test_value_1  Test_value_2
1  Next_value_1  Next_value_2

Adding in os does not change either output

import duckdb
import os
conn = duckdb.connect(':memory:')
os.environ["OGR_XLSX_HEADERS"] = "FORCE"
conn.execute("LOAD Spatial; INSTALL Spatial;")
conn.execute(f"""CREATE TABLE raw AS SELECT * FROM st_read("Test_file.xlsx", layer = "Test_sheet", open_options=['HEADERS=FORCE'])""")
conn.execute("SELECT * FROM raw")
print(conn.fetch_df())

No change to either output

OS:

iOS Sonoma 14.2.1

DuckDB Version:

0.10.1

DuckDB Client:

Python 3.11

Full Name:

Rob Stabler

Affiliation:

Plymouth University Hospitals NHS Trust

Have you tried this on the latest nightly build?

I have tested with a release build (and could not test with a nightly build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

szarnyasg commented 9 months ago

Transferred to the duckdb_spatial repository.

Maxxen commented 9 months ago

Hi! Thanks for reporting this issue. This is a duplicate of https://github.com/duckdb/duckdb_spatial/issues/217. Unfortunately this case is not something Im planning to prioritize fixing anytime soon, but I am working on a separate dedicated extension that can read excel files that should hopefully not have the same problem.