datahuborg / datahub

An experimental hosted platform (GitHub-like) for organizing, managing, sharing, collaborating, and making sense of data.
https://datahub.csail.mit.edu
MIT License
210 stars 60 forks source link

postgres row estimates are way off #125

Open RogerTangos opened 8 years ago

RogerTangos commented 8 years ago

The postgres docker container we have is configured slightly differently from anant's. One of the differences is that vacuum doesn't happen frequently, leading to row estimates in the 1000's, on tables that actually only have single digit rows.

RogerTangos commented 8 years ago

@justinanderson mentions that it may not be vacuum. Row estimates did used to be more accurate before we switched to the vagrant setup though. A fix for this would be to run vacuum before getting a row estimate (but not when normal select/insert queries are made). Will close up this issue later.

justinanderson commented 8 years ago

It is vacuum/analyze that are leading to inaccurate row counts. The vacuum/analyze settings could be more aggressive, but there's a performance tradeoff that probably isn't worth it.

Also, once we start supporting external databases we'll have to deal with inaccurate row counts from those sources as well.