Closed cldellow closed 1 year ago
The facet_results for DateFacet are a good reference on how to make a facet.
I think you could suppress facet computation outside of a specific AJAX call that we use for lazy loading, so that's promising.
I suspect we could just have suggest
always be a no-op, and have facet_results
always return values?
A wrinkle: SQLite will spill to disk when doing counts. A naive Python approach wouldn't, which limits the size of database this technique could work with.
Additionally, some rough benchmarking shows that the Python VM is really slow for computing statistics, even if you avoid function calls and things that look-safe-but-secretly-are-implemented-via-exceptions, like dict.get(key, default-value)
. So, even on a dataset that fit in memory, the runtime perf will be pretty underwhelming.
That makes me think the right approach here is to pivot a bit.
rowid > :p0
clauses)rowid
column to learn that it still doesn't contain a JSON array of tagsI think I like the idea of storing the cached values in a temporary DB that goes away when Datasette is restarted. That might keep me more honest -- this is just an optimization, nothing of value should be stored in it, and nothing should assume its existence is guaranteed.
https://www.sqlite.org/inmemorydb.html says that opening an empty string will get you a file-backed temp db that goes away when the connection closes. I like the automatic deletion...but I think its tied to a connection, so the db would be bound to a single thread.
I'd rather have it in WAL mode and be able to hit it from an arbitrary thread/process (eg: maybe we warm things up on startup).
This will mean that I'm excluding the possibility of running in "immutable" environments. That might be OK? I think many such environments offer a limited tmpfs.
Worst case scenario, we can open a shared in-memory db and use threads (vs processes) for the background worker.
A brutal idea: track max(rowid) for rowid tables, when it changes, invalidate facets.
Where I'm at: I can move the facets to the sidebar (good), but they're still being rendered by the server (bad).
I can do the rendering client-side if I know which facets to fetch.
Facets can come from two places:
I can pluck out the ones hardcoded in metadata.json and pass them via https://docs.datasette.io/en/stable/plugin_hooks.html#extra-body-script-template-database-table-columns-view-name-request-datasette
The ones suggested by a facet are harder, because we've lost the context. We could stash information on the request
object?
Maybe for now we just hardcode some stuff so we can scaffold the rest of the system.
I think faceting everything by default is too much of a change.
Instead, we'll do what DS does today and drive facets from metadata and qs params.
I still think we should just kill facet suggestion--the marginal utility of being more discoverable is not worth the perf and screen real estate hit on every request.
Instead, we'll add explicit facet options in the column dropdowns.
To start, just add an option for each facet indiscriminately. Later we can make it smart enough to only show valid choices.
I think there isn't an official way to hook column actions yet: https://github.com/simonw/datasette/issues/983#issuecomment-752729035
We might be able to party in https://github.com/simonw/datasette/blob/main/datasette/static/table.js#L1
...although discovering which column we're connected too will be a pain. I think we'd have to parse the absolute positioning then reverse engineer which column it is
This also means mobile users won't be able to control facets, I'm ok with that
Let's scope this down to just getting desktop feature parity.
Caching and changing to ORs can come as their own issues.
Proposal
We'll monkey-patch the existing ColumnFacet, DateFacet, ArrayFacet.
A pragmatic way to get started:
[x] ui: add some JS that fetches facets via the JSON API and updates the sidebar
ideally, we only compute facets? I think we can pass
nocount
,nosuggest
to disable a bunch fos tuff, maybe we can also pass pagesize=0? see code_size=0&_nocount=1
is a start, although the size gets bumped up to 1 (to discover pagination? special case of 0!)ideally we make 1 call per facet, so we can begin rendering as soon as any data is available
it'd be nice if we could reuse _facet_results.html -- but not necessary for an MVP
toggle_url
- strip.json
,_nocount
,_size
__dux_facets
driven by metadata and qs paramsBackground
Facets are really cool. They're complex and present many tradeoffs. There's no one right set of choices that can satisfy everyone. I think Datasette's current approach with them is fairly conservative, which is a sensible choice for Datasette, the platform. For my own tastes, I'd like them to behave a bit differently. And since this is a plug-in, YOLO, let's take a wildly different approach.
What I like:
What I'd like to adjust:
LIMIT
clause applies after theWHERE
clause, see DateFacetThat's a big list! They don't all depend on each other. This ticket is primarily to explore the performance side of things.