Closed IljaKroonen closed 5 years ago
Yes that'd be very nice. Here's a better way to write this sql with 2 scans instead of 3:
SELECT COALESCE(b.distributor, 'Others'), sum(admissions) as admissions
FROM TicketSales ts
LEFT JOIN (
SELECT TOP 10 distributor FROM TicketSales
GROUP BY distributor
ORDER BY sum(admissions) DESC;
) as subq ON ts.distributor = subq.distributor
GROUP BY COALESCE(b.distributor, 'Others')
ORDER BY sum(admissions) DESC;
For the "Series Limit" feature we're already doing this essentially, except with an INNER JOIN. Showing "Others" there should be easy-ish. You have to change the join to a left join and do this COALESCE thing.
True, your query resolved to a better execution place at least in my database :)
We needed this feature quickly and because I don't really understand how caravel generates queries (I am not familiar with the libraries involved, especially around druid), we decided to implement it directly with pandas in viz.py (post query). If this interests you I can put the pull request online once we're done, but IMHO query level would be better to do this. FYI, it looks like this in all visualizations that have a LIMIT option.
@IljaKroonen Hi, may i ask a question about the 'others category' feature that why cannot find the relational code in superset object? Is it not supported yet :-(
I would be interested in some functionality to group results that cannot be show in some "others" category (could be enabled with a checkbox on the slice).
This could be done at query level. Instead of doing:
Caravel could do:
Resulting in the 9 distributors with the most tickets sold and an "others" row with the total of all tickets sold by other distributors.
Would this be achievable with the libraries currently used by caravel? I have been looking at the query method in models.py and this seems like a complex change.