Automattic / mongoose

MongoDB object modeling designed to work in an asynchronous environment.
https://mongoosejs.com
MIT License
26.91k stars 3.83k forks source link

[Discussion] executing populate: parallel/sequence/optional? #10480

Open AbdelrahmanHafez opened 3 years ago

AbdelrahmanHafez commented 3 years ago

Currently, when we do the following populate:

await Book.findOne()
  .populate([
    { path: 'authorId' },
    { path: 'reviewsIds' }
  ]);

It would work under the hood by executing 3 queries: 1- Book.findOne(...) 2- Author.findOne(...) 3- Review.find(...)

I've run the script below to have more understanding of how the populate queries are executed. Turns out populate queries are executed in sequence (e.g., we wait for author results to come back from the database, then we start finding reviews).

This approach would prevent slow trains to slow the whole application down, however, if an application makes few requests and would rather make specific endpoints faster, using populate would make things slower.

Also, the slow trains issue would become a problem only if all the populates are slow, if one or two are slow that would mean there are still remaining sockets for MongoDB to channel the other queries on, given the default of 5 sockets by MongoDB, we could also change the default/recommend that people use 10~20 in production which would likely mitigate the risks of slow trains and give better performance.

I think we should:

I'd also like to create a simulation of a typical application and compare performance between parallel/sequence populate behavior, any suggestions on the experiment constraints to make it unbiased as much as possible are welcome.

Thoughts? @vkarpov15 @IslandRhythms @ahmedelshenawy25

import mongoose from 'mongoose';
const { Schema } = mongoose;
import assert from 'assert';

await prepareConnection();
const Book = getBookModel();
const Author = getAuthorModel();
const Review = getReviewModel();

await createDocuments();

mongoose.set('debug', true);

const modelFindOneStartedAt = Date.now();
const book = await Book.findOne()
  .populate([
    { path: 'authorId', match: { $where: 'sleep(200) || true' } },
    { path: 'reviewsIds', match: { $where: 'sleep(200) || true' } }
  ]);
const modelFindOneEndedAt = Date.now();

const queryDuration = modelFindOneEndedAt - modelFindOneStartedAt;
assert.ok(queryDuration > 400);

console.log(`Time consumed ${queryDuration}ms.`);

assert.ok(book.authorId);
assert.ok(book.reviewsIds.length);

console.log(`Mongoose version: ${mongoose.version}`);
console.log('Done.');

async function createDocuments () {
  const author = await Author.create({ name: 'Martin Fowler' });
  const reviews = await Review.create([
    new Review({ content: 'Such a great book!' }),
    new Review({ content: 'Good book, highly recommend it.' })
  ]);

  await Book.create({
    title: 'Refactoring',
    authorId: author._id,
    reviewsIds: reviews.map(review => review._id)
  });
}

async function prepareConnection () {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });

  await mongoose.connection.dropDatabase();
}

function getBookModel () {
  const bookSchema = new Schema({
    title: { type: String },
    authorId: { type: Schema.ObjectId, ref: 'Author' },
    reviewsIds: [{ type: Schema.ObjectId, ref: 'Review' }]
  });

  const Book = mongoose.model('Book', bookSchema);
  return Book;
}

function getAuthorModel () {
  const authorSchema = new Schema({ name: String });
  const Author = mongoose.model('Author', authorSchema);
  return Author;
}

function getReviewModel () {
  const reviewSchema = new Schema({ content: String });
  const Review = mongoose.model('Review', reviewSchema);
  return Review;
}

Output

Mongoose: books.findOne({}, { projection: {} })
Mongoose: authors.find({ '$where': 'sleep(200) || true', _id: { '$in': [ ObjectId("60f95bb6b1d6466ff4e0b466") ] }}, { skip: undefined, limit: undefined, perDocumentLimit: undefined, projection: {}})
Mongoose: reviews.find({ '$where': 'sleep(200) || true', _id: { '$in': [ ObjectId("60f95bb6b1d6466ff4e0b468"), ObjectId("60f95bb6b1d6466ff4e0b469") ] }}, { skip: undefined, limit: undefined, perDocumentLimit: undefined, projection: {}})
Time consumed 482ms.
Mongoose version: 5.13.3
Done.
AbdelrahmanHafez commented 2 years ago

A third option would be to provide a concurrency: number option, that specifies the maximum number of queries allowed to run in parallel.