SEL-Columbia / formhub

Mobile Data Collection made easy.
http://formhub.org
BSD 2-Clause "Simplified" License
259 stars 163 forks source link

Get rid of expensive count #1317

Open prabhasp opened 10 years ago

prabhasp commented 10 years ago

Here: https://github.com/SEL-Columbia/formhub/blob/master/odk_viewer/templates/instance.html#L182 Its resulting in some of the most expensive queries, and could be easily replaced by a client-side number store.

jmwohl commented 10 years ago

I was able to partially fix this. The way it's setup, I don't see an easy way to avoid the initial count query call which determines the current record's position in the total — the change I made makes the call on page load, but not on prev/next clicks.

I think it would be possible to avoid this initial count query as well by passing the record index from the previous page when the user selects the form to view, but that would require changes to other files (and the code is a mess).

https://github.com/SEL-Columbia/formhub/blob/1317_count/odk_viewer/templates/instance.html

dpapathanasiou commented 10 years ago

If we can restructure this query to use $gt instead of or in combination with $lt the query becomes much more efficient:

Original:

> db.instances.find({ $and: [ { _userform_id: "datamatrix_datamatrix", _id: { $lt: 3088978 } }, { _deleted_at: null } ] }).explain()
{
    "cursor" : "BtreeCursor _userform_id_1",
    "isMultiKey" : false,
    "n" : 202875,
    "nscannedObjects" : 210745,
    "nscanned" : 210745,
    "nscannedObjectsAllPlans" : 211150,
    "nscannedAllPlans" : 211150,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 63,
    "nChunkSkips" : 0,
    "millis" : 36508,

Replaced with $gt:

> db.instances.find({ $and: [ { _userform_id: "datamatrix_datamatrix", _id: { $gt: 3088978 } }, { _deleted_at: null } ] }).explain()
{
    "cursor" : "BtreeCursor _id_",
    "isMultiKey" : false,
    "n" : 7849,
    "nscannedObjects" : 30757,
    "nscanned" : 30757,
    "nscannedObjectsAllPlans" : 31717,
    "nscannedAllPlans" : 31717,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 7,
    "nChunkSkips" : 0,
    "millis" : 2857,

Even better when using $gt and $lt together:

> db.instances.find({ $and: [ { _userform_id: "datamatrix_datamatrix", _id: { $gt: 3088900, $lt: 3088978 } }, { _deleted_at: null } ] }).explain()
{
    "cursor" : "BtreeCursor _id_",
    "isMultiKey" : false,
    "n" : 35,
    "nscannedObjects" : 77,
    "nscanned" : 77,
    "nscannedObjectsAllPlans" : 385,
    "nscannedAllPlans" : 385,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 3,
prabhasp commented 10 years ago

Further investigation on Denis's comment that performance on $lt vs. $gt depends on the number (eg. 3088900) we are querying with. Generally, because people tend to access high _id valued (= more recent) survey instances, its preferable to use $gt rather than $lt, but this will not be the end-all-be-all solution.