Open rgieseke opened 1 year ago
Hey, glad you were able to get it working.
I think you're touching on two things.
1) How do we warn people that suggest_facets
should probably be disabled?
2) Why is that error happening?
Re (1) - The Datasette object exposes a way to query the setting values. Maybe datasette-parquet
can look at this in its startup hook and print a warning?
Re (2) - the rationale for this suggestion was performance, not correctness. When running SQLite queries, Datasette can interrupt them if they take longer than, say, a second. So with SQLite, facet suggestions can be slow, but not prevent the system from working. With DuckDB, queries aren't interruptible, so facet suggestions can make the system appear to hang.
But in this case, it looks like DuckDB thinks the query is invalid, and that's throwing an exception.
This might be due to SQLite's loosey-goosey rules on GROUP BY. That is, SQLite permits the query, but DuckDB rejects it.
There is a very real chance that, although that query "works" in SQLite, it does not do what the author intended. Do you have a full stack trace? If we could figure out which facet is generating that SQL, we might want to open a bug against it.
Thanks for the quick feedback!
Here is the full stacktrace:
Traceback (most recent call last):
File "/home/robert/test/env/lib/python3.10/site-packages/datasette_parquet/winging_it.py", line 94, in execute
rv = self.cursor.execute(sql, parameters)
duckdb.duckdb.BinderException: Binder Error: column "provenance" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(provenance)" if the exact value of "provenance" is not important.
LINE 1: SELECT provenance AS value, COUNT(*) AS n FROM...
^
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/app.py", line 1354, in route_path
response = await view(request, send)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/base.py", line 134, in view
return await self.dispatch_request(request)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/base.py", line 91, in dispatch_request
return await handler(request)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/base.py", line 361, in get
response_or_template_contexts = await self.data(request, **data_kwargs)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/table.py", line 158, in data
return await self._data_traced(request, default_labels, _next, _size)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/table.py", line 562, in _data_traced
await gather(execute_facets(), execute_suggested_facets())
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/table.py", line 180, in gather
results.append(await fn)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/table.py", line 559, in execute_suggested_facets
for suggest_result in await gather(*facet_suggest_awaitables):
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/views/table.py", line 180, in gather
results.append(await fn)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/facets.py", line 179, in suggest
distinct_values = await self.ds.execute(
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/app.py", line 782, in execute
return await self.databases[db_name].execute(
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/database.py", line 267, in execute
results = await self.execute_fn(sql_operation_in_thread)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette_parquet/ducky.py", line 88, in execute_fn
return await asyncio.get_event_loop().run_in_executor(
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette_parquet/ducky.py", line 86, in in_thread
return fn(self.conn)
File "/home/robert/test/env/lib/python3.10/site-packages/datasette/database.py", line 237, in sql_operation_in_thread
cursor.execute(sql, params if params is not None else {})
File "/home/robert/test/env/lib/python3.10/site-packages/datasette_parquet/winging_it.py", line 98, in execute
raise exceptions.DoubleQuoteForLiteraValue(matches)
datasette_parquet.exceptions.DoubleQuoteForLiteraValue: ['"provenance" must appear in the GROUP BY clause or must be part of an aggregate function.\nEither add it to the GROUP BY list, or use "ANY_VALUE(provenance)" if the exact value of "provenance"']
INFO: 127.0.0.1:38642 - "GET /testdb/test HTTP/1.1" 500 Internal Server Error
/usr/lib/python3.10/asyncio/base_events.py:1910: RuntimeWarning: coroutine 'ArrayFacet.suggest' was never awaited
handle = None # Needed to break cycles when an exception occurs.
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.10/asyncio/base_events.py:1910: RuntimeWarning: coroutine 'DateFacet.suggest' was never awaited
handle = None # Needed to break cycles when an exception occurs.
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
GitHub doesn't allow me to upload a parquet file but I used a small test file like this:
{
"plugins": {
"datasette-parquet": {
"testdb": {
"directory": "./testdata"
}
}
}
}
Error happens with
datasette --metadata metadata.json
No error with
datasette --metadata metadata.json --setting suggest_facets off
Seems to have to do with the value
column. If I remove that there is no problem.
So something as value
is probably wrong as the query can't differentiate between the two later I assume.
The query seems to be (from commenting out the print
statement in winging_it.py
)
SELECT provenance AS value, COUNT(*) AS n FROM (SELECT value, provenance, code, category, year FROM test) WHERE NOT value IS NULL GROUP BY value LIMIT 31
Here is the query in Datasette: https://github.com/simonw/datasette/blob/452a587e236ef642cbc6ae345b58767ea8420cb5/datasette/facets.py#L168
When I run the test data in a Sqlite db with Datasette this works, so this seems to be something working differently with DuckDB.
It's a problem with Datasette's query (when there is a 'value' column), I've created an issue there: https://github.com/simonw/datasette/issues/2208
Ah, great, thanks for chasing that down!
I'll leave this open as a reminder that it'd be good to add a warning about facet suggestions on startup
Absolutely fantastic project, thanks a lot!
As hinted to in the Readme, disabling facet suggestions might be needed.
Without it disabling I was getting an error when clicking on a view of a parquet file:
When started with
there is no problem.
Not sure if there is a way to check or prevent this error without it, happy to look into it if you have any pointers! ("unit" is a column of string values.)