BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
2.02k stars 213 forks source link

[BUG] Order of elements in list[Link] is not preserved #930

Closed moonrail closed 1 month ago

moonrail commented 4 months ago

Describe the bug As #414 and #527 were closed without any interaction by a stale bot (yay...) here another attempt.

When using DBRefs/Links, the ordering is stable when using MongoDB directly or any of its lower level libraries.

This is however not always the case with beanie.

Upon insert/update the order is preserved, as defined.

Upon link-fetching via e.g. fetch_links or fetch_all_links, it is not.

A possible solution can be found here: https://github.com/BeanieODM/beanie/issues/414#issuecomment-1444126740 This just copies all Ids to a list with the original order and sorts the fetched documents later based on this order.

To Reproduce Examples can be found in both older issues.

Expected behavior Using fetch_all_links and fetch_links keeps the order of items as defined as DBRefs in the database.

Additional context

jamesmurphy-mc commented 4 months ago

Note that in addition to not preserving the order of elements, fetch_links may even change the length of the returned list by removing duplicate elements. In summary, fetch_links behaves like an $in query, not like a list fetch.

The offending code is here, in the LinkTypes.LIST branch of the construct_query function in beanie.odm.utils.find: https://github.com/BeanieODM/beanie/blob/51e73ebc297df02082c5c683d555f7d057c4f6aa/beanie/odm/utils/find.py#L270-L275

I see this as more of a correctness issue than a convenience issue. As a mongodb datastructure, arrays can contain duplicates and their order is part of their semantics. Users will expect fetching an array to fetch elements in order, including duplicates. I believe there are ways to dereference an array of ids efficiently, but even if it turns out that this is a fundamentally slow operation, then I think the solution would be to warn users of the conditions under which the operation is slow rather than to break array semantics.

As a proof of concept, here is an example query that does an in-order list fetch that could be modified to replace the existing $lookup stage in construct_query (playground link: https://mongoplayground.net/p/t7TyJK1oOEI)

db.orders.aggregate([
  {
    $unwind: "$items"
  },
  {
    $lookup: {
      from: "inventory",
      localField: "items.$id",
      foreignField: "_id",
      as: "items"
    }
  },
  {
    $group: {
      _id: "$_id",
      items: {
        $push: {
          $first: "$items"
        }
      },
      __oldRoot: {
        $last: "$$ROOT"
      }
    }
  },
  {
    $replaceRoot: {
      newRoot: {
        $mergeObjects: [
          "$__oldRoot",
          {
            items: "$items"
          }
        ]
      }
    }
  }
])

The idea is to unwind the array, perform an efficient lookup on the id, group the results into an object containing the fetched array and the old document, then merge the old document with a new document containing the fetched items.

jamesmurphy-mc commented 3 months ago

@roman-right How do you feel about the solution proposed here? Have I missed anything or do you have other concerns that I could help address? Before submitting a PR I would want to have you on-board with the general plan.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity.

Toothwitch commented 2 months ago

As this has a proposed solution and just needs acknowledgement i am against this staying stale. This is still an existing issue and the proposal would also fix unwanted deduplication which in some cases is necessary for lists :)

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 14 days with no activity.