Automattic / mongoose

MongoDB object modeling designed to work in an asynchronous environment.
https://mongoosejs.com
MIT License
26.96k stars 3.84k forks source link

Identical populated sub-documents are references to each other? #3258

Closed aichholzer closed 5 years ago

aichholzer commented 9 years ago

Consider this schema:

// Documents shema
schema = new Schema({
  user: {
    type: Schema.Types.ObjectId,
    ref: 'User'
  },
  title: {
    type: String,
    trim: true,
    required: true
  }
});

I do a Model.find() and chain the .populate() method. I am populating the user:

.populate([
   {
     path: 'user',
     select: 'username about'
   }
])
.lean()
.exec()

I receive what I expect:

var documents = [
  {
    "_id": "55ca812198126c20090d3b46",
    "title": "Cars",
    "user": {
      "_id": "55ca811c98126c20090d3b33",
      "username": "6970840375870466",
      "about": "I have updated myself."
    }
  },
  {
    "_id": "55ca812198126c20090d3b45",
    "title": "Bikes",
    "user": {
      "_id": "55ca811c98126c20090d3b33",
      "username": "6970840375870466",
      "about": "I have updated myself."
    }
  }
]

Once I have the result, I apply a .map function to the array to change the user:

documents.map(function (document) {
  document.user.userId = hash(document.user._id);
  document.user._id = undefined;

  return document;
});

and this .map() fails. The first time it goes through the documents array, it successfully creates the userId property. The second time it says document.user._id is undefined. In fact, if I print each of the array's objects, the second iteration produces this:

{
  "_id": "55ca812198126c20090d3b45",
  "title": "Bikes",
  "user": {
    "username": "6970840375870466",
    "about": "I have updated myself.",
    "userId": "55ca811c98126c20090d3b33"
  }
}

For some mysterious reason, it seems the .populate() method, when it populates documents that are the same, uses a single reference. For the time being, it's possible to solve like this (which seems like an ugly hack to me):

documents = JSON.parse(JSON.stringify(documents));
documents.map(function (document) {
   ...
});

Any ideas?

vkarpov15 commented 9 years ago

This is by design. Mongoose doesn't do any extra copying; the way that populate works is that we do an extra query, and then assign the documents you're populating to the result of that query. We don't make copies of the returned docs, so if you have one populate with multiple docs pointing to the same id, you'll get the same reference. There are a few workarounds, for instance, just check if _id is defined.

Open to discussion about changing this behavior. Reason against it is mostly performance.

aichholzer commented 9 years ago

@vkarpov15 thank you for your reply, I figured as much for performance reasons. However, I think this should be optional and left to the developer, after all he is the one who needs to make decisions based on his actual needs. For instance, I would like to suggest something like this:

.populate([
  {
    path: 'user',
    select: 'username about'
  }
], { copy: true })

where copy defaults to false. This would make it possible to ask Mongoose to explicitly do the extra copying instead of using a reference.

By the way; just checking if _id is defined does not really solve the problem in case the object is undergoing a more complex set of transformations, at least in my case. Plus, the first modified object would have to be kept around so it can be copied in case _id does no longer exist in the subsequent objects.

vkarpov15 commented 9 years ago

It's definitely possible to add the 'copy' option. Will put it in for next release, but would welcome pull requests :)

aichholzer commented 9 years ago

šŸ‘šŸ¼

Jeff-Lewis commented 9 years ago

@vkarpov15 are you thinking just to add an option to 'copy' but leave the default behavior? I hope you do leave the default behavior.

It's an important aspect of any ODM that references are the same document when working with multiple populated graphs of documents. Performance alone is not the only reason for this behavior.

vkarpov15 commented 9 years ago

That's a reasonable argument. I think this might be easier to add as a plugin, but there's no plans to change the default in the foreseeable future.

andreafdaf commented 6 years ago

Hello everyone, sorry if I'm bringing this issue back to life after so long but I've stumbled on this behavior (on v5.1.5)

I have a populated doc with a date, that must be formatted according to a timezone in the parent element. So for example: doc A (timezone: Europe/Rome), subdoc X (date: 2018-10-04T15:00:00.000Z) doc B (timezone: America/New_York), subdoc X (date: 2018-10-04T15:00:00.000Z)

It's not possible to solve this by simply checking if the date is already formatted, since I really want formatted_date to be different on each subdoc... of course I can simply assign formatted_date to the parent but the field would be in a "wrong" position inside the object! So the only way is to copy the subdoc before operating on it.

Are there plans to introduce the { copy } option? I guess I understand what Jeff says but if the default remains { copy: false } there should be no differences, am I missing something? Is a PR welcome or it's already decided that this functionality should go in a plugin?

vkarpov15 commented 6 years ago

@andreafdaf I'm not sure I understand your issue, can you elaborate with code samples? Dates in MongoDB are just numbers, JavaScript is responsible for all date formatting, so you should be able to handle date formatting with a virtual

andreafdaf commented 6 years ago

@vkarpov15 Sorry, my example wasn't really clear... talking about dates and timezones was just a specific use case, let me illustrate with some pseudojs:

// models
const Event = new Schema({
  start: { type: Date }
})
const User = new Schema({
  local_timezone: { type: String }
})
const Reservation = new Schema({
  user: { type: Schema.ObjectId, ref: 'User' },
  event: { type: Schema.ObjectId, ref: 'Event' },
})

Suppose we have:

Bob and Alice have a reservation for Conference Call, so let's say we want to send them an email with a reminder telling when the meeting is going to take place:

const reservations = await Reservation.find(...).populate('user event').exec()
for (const reservation of reservations) {
  const { user } = reservation
  const { event } = reservation // this is always the same object since it's a reference
  event.formatted_date = moment.tz(event.start, user.local_timezone).format()
  await sendEmail(reservation)
}

In this case event.formatted_date will be the same in both emails, to fix this we must:

const reservations = await Reservation.find(...).populate('user event').exec()
for (const reservation of reservations) {
  const { user } = reservation
  let { event } = reservation // this is always the same object since it's a reference
  event = JSON.parse(JSON.stringify(event))
  event.formatted_date = moment.tz(event.start, user.local_timezone).format()
  reservation.event = event
  await sendEmail(reservation)
}

This is a simple example, but geez we had some bugs already caused by ignoring that populated objects were refs and not distinct copies of the same data

As pointed out by the OP, it would be nicer if we could do this:

const reservations = await Reservation.find(...).populate('user event', { copy: true }).exec()
for (const reservation of reservations) {
  const { user } = reservation
  const { event } = reservation // this is not the same object for each reservation
  event.formatted_date = moment.tz(event.start, user.local_timezone).format()
  await sendEmail(reservation)
}

or

const Reservation = new Schema({
  user: { type: Schema.ObjectId, ref: 'User' },
  event: { type: Schema.ObjectId, ref: 'Event', options: { copy: true } },
})

I hope it's a little bit clearer this way :)

It's not big deal, sure... we could simply assign formatted_date to reservation and we're done without changing anything, but maybe some parts of the code want formatted_date to be under event; a new dev might ignore this fact and trigger bugs anew until it understands where the issue comes from; during refactor those extra object copying parts might be seen as "overcomplication" and thrown away if not properly commented...

I know those are not really good arguments, the answer for all of them is "code better" but you know, bad coding happens and having a way to consistently prevent errors from happening is always nice

As always, thanks for your time! :v:

vkarpov15 commented 6 years ago

Thanks for the detailed explanation @andreafdaf , that was very helpful. I have a much better understanding of the why of this issue now

andreafdaf commented 6 years ago

It's always a pleasure to contribute @vkarpov15

I'm open to keep watching/discussing the implementation if you like to have a couple of eyes more to look at it. I would PR something but I don't think I have enough knowledge of mongoose internals to do a good job :cry:

emirhanyagci commented 2 months ago

I'm facing with issue while using s3 bucket , what is best practice I was doing chat app this app contain 3 collection Chats,Messages,Users and User I find messages at group and of possible to same user's message in group in this case when populate like that I'm having trouble getting user avatars signed proccess this is my query.

 const messages = await Message.find({ chat: chatId })
    .sort({ createdAt: 1 })
    .populate("sender")
    .select("-password")
    .exec();

this is what i do . setSignedUrl is my utils its just using getSignedUrl function of @aws-sdk/s3-request-presigner packages under the hood . For example we have 2 message from same user then first execute of for both of avatar signed and setted , at second execute this time key broken because we try sign to signed url , and as far as I know, and according to what is written here, it is impossible to tell if the url is signed or not "The pre-signed URL is simply a local calculation and signing of a URL. There is no interaction with the S3 service"

  for (let message of messages) {
    message.sender.avatar = await setSignedUrl(message.sender.avatar);
  }

after that i did this for temporary solution but this is kind weird not feeling like right solution.

 let clonedMessages = JSON.parse(JSON.stringify(messages));

  for (let message of clonedMessages) {
    message.sender.avatar = await setSignedUrl(message.sender.avatar);
  }

I think should be some option for this