cldellow / datasette-ui-extras

Add editing UI and other power-user features to Datasette.
Apache License 2.0
12 stars 1 forks source link

add statistical summary facet #43

Closed cldellow closed 1 year ago

cldellow commented 1 year ago

Show min, p50, p90, p95, p99, max

When you add it, we won't be able to correctly tell that's what you did, unless we do something special. e.g. adding response_time > 500 will change the population on which we're calculating the statistics. We could add a __dux_stats=XXX parameter that we use to freeze the facets.

But then that won't update if you refine the other filters. Maybe that's OK?

Also unclear what the default action should be, eg filter to things higher than it? Lower than it? Wouldn't you just... sort?

Maybe the toggle URL should just be # and we never return selected = True?

cldellow commented 1 year ago

Sample query:

with xs as (select views from posts where post_type = 'question'), with_ids as (select row_number() over () as id, views as value from xs), ntiles as (select id, value, ntile(100) over (order by value) percentile from with_ids) select  (select max(value) from ntiles where percentile = 99) as p99, (select max(value) from ntiles where percentile = 95) as p95, (select max(value) from ntiles where percentile = 90) as p90, (select max(value) from ntiles where percentile = 50) as median ;

It's pretty slow -- 800ms on my computer to do 90K questions. It could be faster with https://www.sqlite.org/lang_with.html#materialization_hints, but that's only for SQLite >= 3.35.0

Still, probably good enough?

cldellow commented 1 year ago

oh, actually this works well:

with xs as (select views from posts where post_type = 'question'), with_ids as (select row_number() over () as id, views as value from xs), ntiles as (select id, value, ntile(100) over (order by value) percentile from with_ids) select percentile, min(value), max(value) from ntiles group by 1 having percentile in (1, 50, 90, 95, 99, 100);