Closed chaeron closed 4 years ago
I'm sure there's a lot we can do, but hard to know where to focus when all we have to go on is "complex objects". Can you please provide your schemas, or a simplified version of your schemas so we can run some experiments?
I have experienced extremely poor performance with only a 3KB documents, but 4~5 depths schema. It took about 14s for the server to respond.
Then I've debugged into the code a bit. What I observed is that Schema.prototype.path()
has been called >100K times.
Then I profiled it and saw https://github.com/Automattic/mongoose/blob/984173b14431b6447dfb774719c433b19b670bec/lib/document.js#L961 in the call stack.
Then I console.log the pathName and discovered that too many dirty fields like options.shardKey
, options.autoIndex
, _userProvidedOptions
, s.hooks._posts
, and more and more different levels combinations are all in it.
Then I guess it might be related to this line https://github.com/Automattic/mongoose/blob/984173b14431b6447dfb774719c433b19b670bec/lib/document.js#L126 , so I commented it out.
And then the performance gets back like <10ms.
Feel like a bug and hope this information could be helpful.
Under this line: https://github.com/Automattic/mongoose/blob/984173b14431b6447dfb774719c433b19b670bec/lib/document.js#L952
A path object can be
Schema {
obj: {
first: { default: 'A_FIRSTNAME', type: [Function: String] },
last: { default: 'A_LASTNAME', type: [Function: String] }
},
paths: {
first: SchemaString {
enumValues: [],
regExp: null,
path: 'first',
instance: 'String',
validators: [],
getters: [],
setters: [],
options: [SchemaStringOptions],
_index: null,
defaultValue: 'A_FIRSTNAME',
'$isUnderneathDocArray': true,
[Symbol(mongoose#schemaType)]: true
},
last: SchemaString {
enumValues: [],
regExp: null,
path: 'last',
instance: 'String',
validators: [],
getters: [],
setters: [],
options: [SchemaStringOptions],
_index: null,
defaultValue: 'A_LASTNAME',
'$isUnderneathDocArray': true,
[Symbol(mongoose#schemaType)]: true
},
_id: ObjectId {
path: '_id',
instance: 'ObjectID',
validators: [],
getters: [],
setters: [Array],
options: [SchemaObjectIdOptions],
_index: null,
defaultValue: [Function],
'$isUnderneathDocArray': true,
[Symbol(mongoose#schemaType)]: true
}
},
aliases: {},
subpaths: {},
virtuals: {
id: VirtualType {
path: 'id',
getters: [Array],
setters: [],
options: {}
}
},
singleNestedPaths: {},
nested: {},
inherits: {},
callQueue: [],
_indexes: [],
methods: {},
methodOptions: {},
statics: {},
tree: { [0/1878]
first: { default: 'A_FIRSTNAME', type: [Function: String] },
last: { default: 'A_LASTNAME', type: [Function: String] },
_id: { auto: true, type: 'ObjectId' },
id: VirtualType {
path: 'id',
getters: [Array],
setters: [],
options: {}
}
},
query: {},
childSchemas: [],
plugins: [
{ fn: [Function], opts: [Object] },
{ fn: [Function], opts: [Object] },
{ fn: [Function], opts: [Object] },
{ fn: [Function], opts: [Object] },
{ fn: [Function], opts: [Object] }
],
'$id': 27,
s: { hooks: Kareem { _pres: [Map], _posts: [Map] } },
_userProvidedOptions: { collection: 'public' },
options: {
collection: 'public',
typePojoToMixed: true,
typeKey: 'type',
id: true,
noVirtualId: false,
_id: true,
noId: false,
validateBeforeSave: true,
read: null,
shardKey: null,
autoIndex: null,
minimize: true,
discriminatorKey: '__t',
versionKey: '__v',
capped: false,
bufferCommands: true,
strict: true
},
parentSchema: undefined,
dataLevel: { levelMap: {} },
'$globalPluginsApplied': true,
_requiredpaths: []
}
Looks like fields
recursive?? obj
recursive?? paths
recursive?? aliases
recursive?? subpaths
recursive?? virtuals
recursive?? singleNestedPaths
recursive?? nested
recursive?? inherits
recursive?? methods
recursive?? methodOptions
recursive?? statics
recursive?? tree
recursive?? query
recursive?? s
recursive?? _userProvidedOptions
recursive?? options
will be called recursively.
managed to reproduce it. It's actually because of the nested structure with default schema. For example:
export const addressSchema = new mongoose.Schema({});
export const propertySchema = new mongoose.Schema({
address: {
type: addressSchema,
default: addressSchema,
},
});
await Property.find({}); // which would trigger the initialization of the default address, and all the fields under the addressSchema will be iterated.
I can confirm this script is about 8x slower if you set the default
to a schema instance:
'use strict';
const mongoose = require('mongoose');
run().catch(err => console.log(err));
async function run() {
await mongoose.connect('mongodb://localhost:27017/test', {
useNewUrlParser: true,
useUnifiedTopology: true
});
await mongoose.connection.dropDatabase();
const addressSchema = new mongoose.Schema({ name: String });
const propertySchema = new mongoose.Schema({
address: {
type: addressSchema,
default: addressSchema,
},
});
const propertiesSchema = new mongoose.Schema({ properties: [propertySchema] });
const Property = mongoose.model('Property', propertiesSchema);
const properties = [];
for (let i = 0; i < 10000; ++i) {
properties.push({ address: { name: 'test' } });
}
await Property.create({ properties });
const start = Date.now();
await Property.findOne();
console.log('Elapsed', Date.now() - start);
}
We made a small improvement in 8fea1d9 that removes most of the performance impact, but the above script still ends up being 2x slower than if you don't have the default
. So in 32c5ed0 we made Mongoose throw an error if you set a path's default
to a Mongoose schema instance. There's no reason to ever do that anyway.
@vkarpov15, shouldn't a change like 32c5ed0
rather end up in a new major or at least minor version than a patch? It turned out to be pretty breaking for me. :) You mention that there is no reason to ever use a schema instance as as default, though, so I might have been using the library incorrectly.
We don't use schemas as defaults.....so not sure this will address our original performance issue, but we'll test it when the next release comes out with the patch in place.
@vkarpov15 If we don't put schema as the default field, what's the property way to initialize a default nested instance?
For example, in the schema above, if we want to put country: String
as one field of the address
, and give it a default value US
.
Something like default: () => new Address()
?
@backflip I admit this change is a bit heavy for a patch, but seeing as how (1) setting a default to a schema instance doesn't do anything useful, and (2) it has a significant performance impact, I figured it was worth it to get this change out sooner rather than later. Sorry for any trouble.
@yz-ark7
export const propertySchema = new mongoose.Schema({
address: {
type: addressSchema,
default: () => ({}),
},
});
Mongoose will take care of casting empty object against addressSchema
.
@vkarpov15 perfect! Thanks for letting us know. Just FYI, this is the StackOverflow post that I've originally seen the solution: https://stackoverflow.com/questions/35494875/mongoose-single-embedded-sub-document-default/46003938
@vkarpov15, no worries, it reminded me to be careful with version ranges in my dependencies. :)
And thanks for the proper default
example.
@yz-ark7 not much more I can do about that SO post outside of flagging it. My answer from 2017 is #2 on that SO thread :)
Mongoose v5.9.6 (latest) and latest MongoDB v4.2 running locally.
I ran into a situation writing a REST service where it saves a large document (~750KB, has an array witch about 750 fairly complex objects in it that have 3 embedded subdocuments which are only one level deep).
Mongoose validation took about 30 seconds for the first save, and then after some updates, 2nd save took 22 seconds. This is on some fairly capable hardware, and since this all happens in memory, I'm not sure why it takes that long to do the validation.
Disabling validations for the save() calls cuts the saves down to less than a second each.
Any ideas on how to improve the validation performance?