cjb / codex-blackboard

Meteor app for coordinating solving for our MIT Mystery Hunt team
GNU Affero General Public License v3.0
25 stars 17 forks source link

Blackboard doesn't handle restarts well #119

Open cscott opened 10 years ago

cscott commented 10 years ago

Perhaps clients are somehow doing a really expensive subscription temporarily on restart? Or is this just a bug.

Filed as https://github.com/meteor/meteor/issues/1770

cscott commented 9 years ago

This is really our biggest biggest problem with using meteor at the moment. :(

Torgen commented 5 years ago

I assume one (if not the only) problem at startup in particular is the observeChanges call used for the Hubot integration has no filter, so the server has to load everything that hasn't been archived to OldMessages just to ignore it. If you added an index on just message timestamp and made that observe filter on timestamp > now, the initial load would be instant because there would be no messages matching the filter, but new messages would match and get processed. That might even make it so you don't need OldMessages anymore.

Torgen commented 5 years ago

I think I found your expensive subscription: messages-in-range-to-me. I did a query explain on messages.find({room_name: 'general/0', timestamp: { $gt: 45}, $or: [ {nick: 'torgen'}, {to: 'torgen'} ]} and the winning plan was:

"stage" : "FETCH",
"filter" : {
  "$and" : [
    {
      "room_name" : {
        "$eq" : "general/0"
      }
    },
    {
      "timestamp" : {
        "$gt" : 45
      }
    }
  ]
},
"inputStage" : {
  "stage" : "OR",
  "inputStages" : [
    {
      "stage" : "IXSCAN",
      "keyPattern" : {
        "to" : 1,
        "room_name" : 1,
        "timestamp" : -1
      },
      "indexName" : "to_1_room_name_1_timestamp_-1",
      "isMultiKey" : false,
      "multiKeyPaths" : {
        "to" : [ ],
        "room_name" : [ ],
        "timestamp" : [ ]
      },
      "isUnique" : false,
      "isSparse" : false,
      "isPartial" : false,
      "indexVersion" : 2,
      "direction" : "forward",
      "indexBounds" : {
        "to" : [
          "[\"torgen\", \"torgen\"]"
        ],
        "room_name" : [
          "[MinKey, MaxKey]"
        ],
        "timestamp" : [
          "[MaxKey, MinKey]"
        ]
      }
    },
    {
      "stage" : "IXSCAN",
      "keyPattern" : {
        "nick" : 1,
        "room_name" : 1,
        "timestamp" : -1
      },
      "indexName" : "nick_1_room_name_1_timestamp_-1",
      "isMultiKey" : false,
      "multiKeyPaths" : {
        "nick" : [ ],
        "room_name" : [ ],
        "timestamp" : [ ]
      },
      "isUnique" : false,
      "isSparse" : false,
      "isPartial" : false,
      "indexVersion" : 2,
      "direction" : "forward",
      "indexBounds" : {
        "nick" : [
                "[\"torgen\", \"torgen\"]"
        ],
        "room_name" : [
                "[MinKey, MaxKey]"
        ],
        "timestamp" : [
                "[MaxKey, MinKey]"
        ]
      }
    }
  ]
}

Notice the index bounds for both room_name and timestamp. Here's the winning query plan of messages.find({room_name: 'general/0', timestamp: {$gt: 45}, nick: 'torgen'}):

"stage" : "FETCH",
"inputStage" : {
  "stage" : "IXSCAN",
  "keyPattern" : {
    "nick" : 1,
    "room_name" : 1,
    "timestamp" : -1
  },
  "indexName" : "nick_1_room_name_1_timestamp_-1",
  "isMultiKey" : false,
  "multiKeyPaths" : {
    "nick" : [ ],
    "room_name" : [ ],
    "timestamp" : [ ]
  },
  "isUnique" : false,
  "isSparse" : false,
  "isPartial" : false,
  "indexVersion" : 2,
  "direction" : "forward",
  "indexBounds" : {
    "nick" : [
      "[\"torgen\", \"torgen\"]"
    ],
    "room_name" : [
      "[\"general/0\", \"general/0\"]"
    ],
    "timestamp" : [
      "[inf.0, 45.0)"
    ]
  }
}

Much better index bounds for room_name and timestamp. The plan for { to: 'torgen' } is just as good. In fact if you make it to: $in: [null, 'torgen'] it's still good.

Torgen commented 5 years ago

If you distribute the common query parameters into the $or, you get a good plan:

"stage" : "SUBPLAN",
"inputStage" : {
  "stage" : "FETCH",
  "inputStage" : {
    "stage" : "OR",
    "inputStages" : [
      {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "to" : 1,
          "room_name" : 1,
          "timestamp" : -1
        },
        "indexName" : "to_1_room_name_1_timestamp_-1",
        "isMultiKey" : false,
        "multiKeyPaths" : {
          "to" : [ ],
          "room_name" : [ ],
          "timestamp" : [ ]
        },
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 2,
        "direction" : "forward",
        "indexBounds" : {
          "to" : [
            "[\"torgen\", \"torgen\"]"
          ],
          "room_name" : [
            "[\"general/0\", \"general/0\"]"
          ],
          "timestamp" : [
            "[inf.0, 45.0)"
          ]
        }
      },
      {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "nick" : 1,
          "room_name" : 1,
          "timestamp" : -1
        },
        "indexName" : "nick_1_room_name_1_timestamp_-1",
        "isMultiKey" : false,
        "multiKeyPaths" : {
          "nick" : [ ],
          "room_name" : [ ],
          "timestamp" : [ ]
        },
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 2,
        "direction" : "forward",
        "indexBounds" : {
          "nick" : [
                  "[\"torgen\", \"torgen\"]"
          ],
          "room_name" : [
                  "[\"general/0\", \"general/0\"]"
          ],
          "timestamp" : [
                  "[inf.0, 45.0)"
          ]
        }
      }
    ]
  }
}

So the problem seems to be that mongodb's query planner doesn't understand the distributive property.

Torgen commented 5 years ago

Tried explain('executionStats') on both versions with some actual data. Possibly because there's some overlap between from someone and to nobody, the defactored version was slightly slower, but it was like 2ms to 1ms, so maybe MongoDB knows what it's doing, but that does mean that the nick + room_name + timestamp index is useless.