[Discussion] executing populate: parallel/sequence/optional?

Currently, when we do the following populate:

await Book.findOne()
  .populate([
    { path: 'authorId' },
    { path: 'reviewsIds' }
  ]);

It would work under the hood by executing 3 queries: 1- Book.findOne(...) 2- Author.findOne(...) 3- Review.find(...)

I've run the script below to have more understanding of how the populate queries are executed. Turns out populate queries are executed in sequence (e.g., we wait for author results to come back from the database, then we start finding reviews).

This approach would prevent slow trains to slow the whole application down, however, if an application makes few requests and would rather make specific endpoints faster, using populate would make things slower.

Also, the slow trains issue would become a problem only if all the populates are slow, if one or two are slow that would mean there are still remaining sockets for MongoDB to channel the other queries on, given the default of 5 sockets by MongoDB, we could also change the default/recommend that people use 10~20 in production which would likely mitigate the risks of slow trains and give better performance.

I think we should:

Make this clear in the docs.
Offer an option to mongoose.set('populateInSequence', false); which defaults to true, then we can discuss further what that default should ultimately be.
Offer an option to set populateInSequence: false for specific queries, which could be useful for people wanting specific endpoints to be as fast as possible, even if it's going to cause slow trains in other areas of the application.

I'd also like to create a simulation of a typical application and compare performance between parallel/sequence populate behavior, any suggestions on the experiment constraints to make it unbiased as much as possible are welcome.

Thoughts? @vkarpov15 @IslandRhythms @ahmedelshenawy25

import mongoose from 'mongoose';
const { Schema } = mongoose;
import assert from 'assert';

await prepareConnection();
const Book = getBookModel();
const Author = getAuthorModel();
const Review = getReviewModel();

await createDocuments();

mongoose.set('debug', true);

const modelFindOneStartedAt = Date.now();
const book = await Book.findOne()
  .populate([
    { path: 'authorId', match: { $where: 'sleep(200) || true' } },
    { path: 'reviewsIds', match: { $where: 'sleep(200) || true' } }
  ]);
const modelFindOneEndedAt = Date.now();

const queryDuration = modelFindOneEndedAt - modelFindOneStartedAt;
assert.ok(queryDuration > 400);

console.log(`Time consumed ${queryDuration}ms.`);

assert.ok(book.authorId);
assert.ok(book.reviewsIds.length);

console.log(`Mongoose version: ${mongoose.version}`);
console.log('Done.');

async function createDocuments () {
  const author = await Author.create({ name: 'Martin Fowler' });
  const reviews = await Review.create([
    new Review({ content: 'Such a great book!' }),
    new Review({ content: 'Good book, highly recommend it.' })
  ]);

  await Book.create({
    title: 'Refactoring',
    authorId: author._id,
    reviewsIds: reviews.map(review => review._id)
  });
}

async function prepareConnection () {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });

  await mongoose.connection.dropDatabase();
}

function getBookModel () {
  const bookSchema = new Schema({
    title: { type: String },
    authorId: { type: Schema.ObjectId, ref: 'Author' },
    reviewsIds: [{ type: Schema.ObjectId, ref: 'Review' }]
  });

  const Book = mongoose.model('Book', bookSchema);
  return Book;
}

function getAuthorModel () {
  const authorSchema = new Schema({ name: String });
  const Author = mongoose.model('Author', authorSchema);
  return Author;
}

function getReviewModel () {
  const reviewSchema = new Schema({ content: String });
  const Review = mongoose.model('Review', reviewSchema);
  return Review;
}

Output

Mongoose: books.findOne({}, { projection: {} })
Mongoose: authors.find({ '$where': 'sleep(200) || true', _id: { '$in': [ ObjectId("60f95bb6b1d6466ff4e0b466") ] }}, { skip: undefined, limit: undefined, perDocumentLimit: undefined, projection: {}})
Mongoose: reviews.find({ '$where': 'sleep(200) || true', _id: { '$in': [ ObjectId("60f95bb6b1d6466ff4e0b468"), ObjectId("60f95bb6b1d6466ff4e0b469") ] }}, { skip: undefined, limit: undefined, perDocumentLimit: undefined, projection: {}})
Time consumed 482ms.
Mongoose version: 5.13.3
Done.

Automattic / mongoose

[Discussion] executing populate: parallel/sequence/optional? #10480

Output