cldellow / datasette-ui-extras

Add editing UI and other power-user features to Datasette.
Apache License 2.0
12 stars 1 forks source link

re-jig facets #21

Closed cldellow closed 1 year ago

cldellow commented 1 year ago

Proposal

We'll monkey-patch the existing ColumnFacet, DateFacet, ArrayFacet.

A pragmatic way to get started:

Background

Facets are really cool. They're complex and present many tradeoffs. There's no one right set of choices that can satisfy everyone. I think Datasette's current approach with them is fairly conservative, which is a sensible choice for Datasette, the platform. For my own tastes, I'd like them to behave a bit differently. And since this is a plug-in, YOLO, let's take a wildly different approach.

What I like:

What I'd like to adjust:

That's a big list! They don't all depend on each other. This ticket is primarily to explore the performance side of things.

cldellow commented 1 year ago

The facet_results for DateFacet are a good reference on how to make a facet.

I think you could suppress facet computation outside of a specific AJAX call that we use for lazy loading, so that's promising.

I suspect we could just have suggest always be a no-op, and have facet_results always return values?

cldellow commented 1 year ago

A wrinkle: SQLite will spill to disk when doing counts. A naive Python approach wouldn't, which limits the size of database this technique could work with.

Additionally, some rough benchmarking shows that the Python VM is really slow for computing statistics, even if you avoid function calls and things that look-safe-but-secretly-are-implemented-via-exceptions, like dict.get(key, default-value). So, even on a dataset that fit in memory, the runtime perf will be pretty underwhelming.

That makes me think the right approach here is to pivot a bit.

cldellow commented 1 year ago

I think I like the idea of storing the cached values in a temporary DB that goes away when Datasette is restarted. That might keep me more honest -- this is just an optimization, nothing of value should be stored in it, and nothing should assume its existence is guaranteed.

https://www.sqlite.org/inmemorydb.html says that opening an empty string will get you a file-backed temp db that goes away when the connection closes. I like the automatic deletion...but I think its tied to a connection, so the db would be bound to a single thread.

I'd rather have it in WAL mode and be able to hit it from an arbitrary thread/process (eg: maybe we warm things up on startup).

This will mean that I'm excluding the possibility of running in "immutable" environments. That might be OK? I think many such environments offer a limited tmpfs.

Worst case scenario, we can open a shared in-memory db and use threads (vs processes) for the background worker.

cldellow commented 1 year ago

Demo of facets: html, json

I wonder... could we compute facet_suggestions once at the top level and re-use that set forever? In some cases, you may have sufficiently filtered that it's no longer an interesting facet. But that's probably OK / preferable vs wasting a lot of CPU?

cldellow commented 1 year ago

A brutal idea: track max(rowid) for rowid tables, when it changes, invalidate facets.

cldellow commented 1 year ago

Where I'm at: I can move the facets to the sidebar (good), but they're still being rendered by the server (bad).

I can do the rendering client-side if I know which facets to fetch.

Facets can come from two places:

I can pluck out the ones hardcoded in metadata.json and pass them via https://docs.datasette.io/en/stable/plugin_hooks.html#extra-body-script-template-database-table-columns-view-name-request-datasette

The ones suggested by a facet are harder, because we've lost the context. We could stash information on the request object?

Maybe for now we just hardcode some stuff so we can scaffold the rest of the system.

cldellow commented 1 year ago

I think faceting everything by default is too much of a change.

Instead, we'll do what DS does today and drive facets from metadata and qs params.

I still think we should just kill facet suggestion--the marginal utility of being more discoverable is not worth the perf and screen real estate hit on every request.

Instead, we'll add explicit facet options in the column dropdowns.

To start, just add an option for each facet indiscriminately. Later we can make it smart enough to only show valid choices.

cldellow commented 1 year ago

I think there isn't an official way to hook column actions yet: https://github.com/simonw/datasette/issues/983#issuecomment-752729035

We might be able to party in https://github.com/simonw/datasette/blob/main/datasette/static/table.js#L1

...although discovering which column we're connected too will be a pain. I think we'd have to parse the absolute positioning then reverse engineer which column it is

This also means mobile users won't be able to control facets, I'm ok with that

cldellow commented 1 year ago

Let's scope this down to just getting desktop feature parity.

Caching and changing to ORs can come as their own issues.