Problem
When the sniff_csv function is used, the Columns that it returns cannot be parsed when any of the headers in the CSV contain a single quote character in them.
If the column header contains double single quote, DuckDB interprets it as an SQL escape and outputs only a single quote when output.
Proposed solution
The Columns output of the sniff_csv should escape the single quotes in the column names so that the value can be used safely from subsequent SQL commands.
To Reproduce
Run this script
import tempfile
import duckdb
def test(col_name: str):
with tempfile.NamedTemporaryFile(suffix=".csv", mode="w") as file:
file.write(f"Name,{col_name}\n")
file.write("Alice,123\n")
file.write("Bob,456\n")
file.flush()
with duckdb.connect(":memory:", config={"threads": 1}) as conn:
(cols,) = conn.execute(
"SELECT Columns FROM sniff_csv(?)",
[file.name],
).fetchone()
col_query = conn.execute(f"SELECT {cols}")
column_names = col_query.fetchone()[0].keys()
print(column_names)
test("Age") # ['Name', 'Age']
test("Escaped''Quote") # ['Name', "Escaped'Quote"]
test("Unescaped'Quote") # duckdb.duckdb.ParserException: Parser Error: syntax error at or near "Quote"
If the column name has no single quotes it works fine.
If the single quotes are doubled (i.e. SQL-escaped) it works but the result shows only one qute instead of the original two.
Otherwise it raises an exception.
OS:
macOS aarch64
DuckDB Version:
0.10.2
DuckDB Client:
Python
Full Name:
Dan Homola
Affiliation:
GoodData
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
[X] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
What happens?
Problem When the
sniff_csv
function is used, theColumns
that it returns cannot be parsed when any of the headers in the CSV contain a single quote character in them. If the column header contains double single quote, DuckDB interprets it as an SQL escape and outputs only a single quote when output.Proposed solution The
Columns
output of thesniff_csv
should escape the single quotes in the column names so that the value can be used safely from subsequent SQL commands.To Reproduce
Run this script
If the column name has no single quotes it works fine. If the single quotes are doubled (i.e. SQL-escaped) it works but the result shows only one qute instead of the original two. Otherwise it raises an exception.
OS:
macOS aarch64
DuckDB Version:
0.10.2
DuckDB Client:
Python
Full Name:
Dan Homola
Affiliation:
GoodData
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?