Open EricSoroos opened 5 months ago
I went for that heuristic as I was optimizing for accelerated aggregations - (e.g. Borough column only has 5 values - Manhattan,Bronx,Brooklyn,Queens,Staten Island...), and for filtering with the DataTables view Searchbuilder, where you have an interactive filter builder and response time is important.
Maybe, I can tweak the heuristic so that if the value is negative and less than -1, then the threshold is interpreted as greater than the absolute value?
Alternatively, I can add another set of AutoIndex settings that can be specified in the dotenv template to eval()
an expression that DP+ maintainers can set to specify their own Autoindex heuristics?
It's possible what you want is 5 and up -- that's getting to the point that the planner might choose to use the index, and that's something that an explain analyze would tell you. Right now I'm seeing it create indexes when there's one value text value, which is truly useless. (and in my case, it's a 13mb index + a 38mb text index)
As a side note -- I'm not sure it's worth specifically creating unique indexes when doing the loads from datapusher, unless you're specifically trying to constrain additional entries that are added.
It's possible what you want is 5 and up -- that's getting to the point that the planner might choose to use the index, and that's something that an explain analyze would tell you. Right now I'm seeing it create indexes when there's one value text value, which is truly useless. (and in my case, it's a 13mb index + a 38mb text index)
Yes. For that example, I would set the setting to 5 in the dotenv template. I'll change the default from 3 to 10 as you suggested as that setting is too low. I'll also add a minimum with a default of 3.
As a side note -- I'm not sure it's worth specifically creating unique indexes when doing the loads from datapusher, unless you're specifically trying to constrain additional entries that are added.
Yes. That was what I was going for - to prevent erroneous upserts.
Describe the bug
Auto index generation on cardinality is backwards.
https://github.com/dathere/datapusher-plus/blob/master/datapusher/jobs.py#L1728
Low cardinality indexes on postgresql aren't useful, because the planner will almost never choose them.
The planner will choose a table scan instead of an index scan if it is likely that many/most pages will be read. When you have a low cardinality index, eg a bool, the planner is going to assume that 1/2 of rows will be hit, and therefore essentially all pages. Even if you've got a distribution that's really skewed, the planner will generally not choose that index. (if you do have a known skewed distribution, you can do a partial/multicolumn index where the condition is the rare case, but that's not really applicable for a general purpose tool. )
This should be reversed, and set the default threshold to ~10 at least.