Automattic / mongoose

MongoDB object modeling designed to work in an asynchronous environment.
https://mongoosejs.com
MIT License
26.92k stars 3.84k forks source link

Why does an indexed schema result in slower queries? #8045

Closed khaledosman closed 5 years ago

khaledosman commented 5 years ago

I noticed while writing https://medium.com/@khaledosman/performance-tips-for-mongodb-mongoose-190732a5d382 that sometimes adding a query'd field as an Index to Schema result in slower queries.

What is the current behavior? When running performance tests with different query perf optimisations on two different collections that have the same dataset, one with an index for the query'd field and one without. The queries that use the indexed model seem to be somehow slower.

If the current behavior is a bug, please provide the steps to reproduce. Use this minimal repo with a local installation of mongodb https://github.com/khaledosman/mongo-performance-expirements mongod npm i && yarn start I'm querying for the indexed field favoriteFruit which is different for 50% of the users.

What is the expected behavior? I'd expect the indexed model to return faster queries but instead it was slower

cleaning db
db cleaned
adding 100000 users to the database
finished populating the database with 100000 users
default_query: 1868.893ms
query_with_index: 2031.596ms
query_with_select: 1459.995ms
query_with_select_index: 1130.273ms
lean_query: 602.460ms
lean_with_index: 610.920ms
lean_with_select: 245.496ms
lean_select_index: 285.485ms

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version. Nodejs v12.7.0 Mongoose v5.6.7

sgatade commented 5 years ago

Some results that I ran...

Setup: Windows 10 Pro - 64 Bit, i5-4200 1.60 Ghz, 8 GB RAM MongoDB: 4.0.9 Node: 10.15.3 Mongoose: 5.6.7

--- 1000 users --- cleaning db clean-db: 1038.437ms db cleaned adding 1000 users to the database add-data: 2612.337ms finished populating the database with 1000 users default_query: 101.963ms query_with_index: 105.521ms query_with_select: 55.886ms query_with_select_index: 63.948ms lean_query: 9.716ms lean_with_index: 10.973ms lean_with_select: 6.273ms lean_select_index: 6.862ms

--- 10,000 Users --- cleaning db clean-db: 1089.016ms db cleaned adding 10000 users to the database add-data: 13048.575ms finished populating the database with 10000 users default_query: 420.900ms query_with_index: 347.501ms query_with_select: 286.511ms query_with_select_index: 161.172ms lean_query: 89.424ms lean_with_index: 142.019ms lean_with_select: 35.976ms lean_select_index: 47.051ms

--- 100,000 users --- cleaning db clean-db: 1500.448ms db cleaned adding 100000 users to the database add-data: 128033.892ms finished populating the database with 100000 users default_query: 2927.131ms query_with_index: 2528.479ms query_with_select: 1677.231ms query_with_select_index: 1600.110ms lean_query: 1002.164ms lean_with_index: 935.286ms lean_with_select: 351.627ms lean_select_index: 385.114ms

Quick % workout shows... image

If I execute the same query with Compass (not lean queries), I get varying results everytime!

sgatade commented 5 years ago

Anyone knows how to capture console.timeEnd output to a variable?

khaledosman commented 5 years ago

@sgatade you can use the performance API instead

const { performance } = require('perf_hooks');
const t1 = performance.now()
// do some work
const t2 = performance.now()
const timeConsumed = t2-t1
sgatade commented 5 years ago

@khaledosman This works perfect! Thanks!!!

vkarpov15 commented 5 years ago

I looked a little more closely and I think this behavior is expected - your index on favoriteFruit is not very selective. If you use an index that doesn't meaningfully reduce the number of documents to scan, indexes won't help much, and may even hurt a little.

FWIW, here's the results I get:

$ npm start

> mongo-performance-expirements@1.0.0 start /home/val/Workspace/MongoDB/troubleshoot-mongoose/mongo-performance-expirements
> nodemon index.js

[nodemon] 1.19.1
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: *.*
[nodemon] starting `node index.js`
cleaning db
db cleaned
adding 100000 users to the database
finished populating the database with 100000 users
default_query: 2846.891ms
query_with_index: 2084.971ms
query_with_select: 1369.877ms
query_with_select_index: 1266.956ms
lean_query: 673.973ms
lean_with_index: 726.519ms
lean_with_select: 283.254ms
lean_select_index: 268.205ms

2nd run:

$ npm start

> mongo-performance-expirements@1.0.0 start /home/val/Workspace/MongoDB/troubleshoot-mongoose/mongo-performance-expirements
> nodemon index.js

[nodemon] 1.19.1
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: *.*
[nodemon] starting `node index.js`
cleaning db
db cleaned
adding 100000 users to the database
finished populating the database with 100000 users
default_query: 2146.956ms
query_with_index: 1968.163ms
query_with_select: 1276.897ms
query_with_select_index: 1174.065ms
lean_query: 696.918ms
lean_with_index: 670.894ms
lean_with_select: 287.297ms
lean_select_index: 267.219ms
khaledosman commented 5 years ago

hmm thanks for looking into this, but in that case isn't a 50% decrease in the number of documents to scan selective enough?

vkarpov15 commented 5 years ago

I'm guessing not. Im not an expert on mongidb's query executor but in my experience indexes need to be much more selective to get a big performance win.