elbywan / moongoon

An object-document mapper for MongoDB. 🌙
https://elbywan.github.io/moongoon/
MIT License
44 stars 3 forks source link

aggregation issue causing an exception #5

Closed justmark closed 3 years ago

justmark commented 3 years ago

Hi,

Firstly, thanks for all the work on this project. We're looking at transitioning to Crystal for some work, and this shard is making that possible.

I seem to have run into an aggregation issue though.

I have the following pipeline defined:

  aggregation_pipeline(
    {
      "$sort": {"md": 1}
    },
    { 
      "$group": { 
          "_id": {
            "md": "$md"
          },
          "data": { 
              "$addToSet": "$$ROOT"
          }
      }
    }
  )

This results in _id being an Int rather than an ObjectId. The query works fine in the MongoDB shell. Unfortunately it generates an exception:

Unhandled exception: Unable to deserialize key (_id) having value (BSON(@data=Bytes[13, 0, 0, 0, 16, 109, 100, 0, 212, 7, 241, 95, 0])) and of type (BSON) belonging to type 'Fives'. Expected type(s) '[BSON::ObjectId]'. (Exception)

Any idea what I've done wrong?

Thanks, Mark

elbywan commented 3 years ago

Hi @justmark,

Any idea what I've done wrong?

I don't think you tried anything wrong actually 😉. moongoon is quite opinionated regarding ObjectId properties and it assumes that the model field _id will always be of that type.

The problem with using $group here is that it will return a result that will not comply with the defined Model shape, especially since _id will be an object instead of an ObjectId.

It would be quite complicated to change that since working with union of possible id values would be cumbersome.

The solution would be to use the underlying driver (cryomongo) syntax to query the database and return values.

I've prepared a small self-contained example that should illustrate what I mean below

require "moongoon" 

Moongoon.connect("mongodb://localhost:27017", database_name: "test_db")

# Define a regular model - but without the aggregation_pipeline macro.
class MyModel < Moongoon::Collection
  collection "models"
  property name : String
end

# Clear the db.
MyModel.clear

# Insert models.
[
  {name: "one"},
  {name: "two"},
  {name: "two"},
  {name: "three"},
  {name: "three"},
  {name: "three"},
].map { |model| MyModel.new(**model).insert }

# Use the `.collection` method to access the cryomongo collection object.
# (See: https://elbywan.github.io/cryomongo/Mongo/Collection.html)
# Then call `.aggregate` to perform the aggreation query.
cursor = MyModel.collection.aggregate(
  pipeline: [
    {
      "$sort": {"name": 1},
    },
    {
      "$group": {
        "_id": {
          "name": "$name",
        },
        "data": {
          "$addToSet": "$$ROOT",
        },
      },
    },
  ]
)

# Put the results inside an array.
# Here the results are of type BSON, but it is possible to type them (we will do that later on).
bson_results = cursor.try &.to_a

# Print the results.
puts bson_results.to_pretty_json

# => [
#   {
#     "_id": {
#       "name": "one"
#     },
#     "data": [
#       {
#         "_id": {
#           "$oid": "6018225687abcc60a186f4ff"
#         },
#         "name": "one"
#       }
#     ]
#   },
#   {
#     "_id": {
#       "name": "two"
#     },
#     "data": [
#       {
#         "_id": {
#           "$oid": "6018225687abcc60a186f500"
#         },
#         "name": "two"
#       },
#       {
#         "_id": {
#           "$oid": "6018225687abcc60a186f501"
#         },
#         "name": "two"
#       }
#     ]
#   },
#   {
#     "_id": {
#       "name": "three"
#     },
#     "data": [
#       {
#         "_id": {
#           "$oid": "6018225687abcc60a186f502"
#         },
#         "name": "three"
#       },
#       {
#         "_id": {
#           "$oid": "6018225687abcc60a186f504"
#         },
#         "name": "three"
#       },
#       {
#         "_id": {
#           "$oid": "6018225687abcc60a186f503"
#         },
#         "name": "three"
#       }
#     ]
#   }
# ]

# Access an element field.
# Casting with .as(BSON) is needed to parse values, just like JSON::Any.
puts bson_results.try &.map &.["_id"].as(BSON)["name"]
# => ["one", "two", "three"]

# As casting can be cumbersome, it is also possible to type the results.
# In order to do so, we need to define a model matching the aggregation result shape.
class MyAggregatedModel
  # We need to include this module to enable serialization from mongodb.
  include BSON::Serializable

  record Id, name : String do
    include BSON::Serializable
  end

  record DataElement, _id : BSON::ObjectId, name : String do
    include BSON::Serializable
  end

  property _id : Id
  property data : Array(DataElement)
end

# Replaying the query since the previous cursor has been exhausted.
cursor = MyModel.collection.aggregate(
  pipeline: [
    {
      "$sort": {"name": 1},
    },
    {
      "$group": {
        "_id": {
          "name": "$name",
        },
        "data": {
          "$addToSet": "$$ROOT",
        },
      },
    },
  ]
)

# Use .of(_) to cast values automatically.
aggregated_models = cursor.try(&.of(MyAggregatedModel).to_a) || [] of MyAggregatedModel

pp aggregated_models

# => [#<MyAggregatedModel:0x111cea120
#   @_id=MyAggregatedModel::Id(@name="one"),
#   @data=
#   [MyAggregatedModel::DataElement(
#     @_id=
#       BSON::ObjectId(
#       @data=Bytes[96, 24, 37, 54, 221, 231, 147, 231, 191, 80, 216, 101]),
#     @name="one")]>,
#   #<MyAggregatedModel:0x111cea0c0
#   @_id=MyAggregatedModel::Id(@name="two"),
#   @data=
#   [MyAggregatedModel::DataElement(
#     @_id=
#       BSON::ObjectId(
#       @data=Bytes[96, 24, 37, 54, 221, 231, 147, 231, 191, 80, 216, 102]),
#     @name="two"),
#     MyAggregatedModel::DataElement(
#     @_id=
#       BSON::ObjectId(
#       @data=Bytes[96, 24, 37, 54, 221, 231, 147, 231, 191, 80, 216, 103]),
#     @name="two")]>,
#   #<MyAggregatedModel:0x111cea080
#   @_id=MyAggregatedModel::Id(@name="three"),
#   @data=
#   [MyAggregatedModel::DataElement(
#     @_id=
#       BSON::ObjectId(
#       @data=Bytes[96, 24, 37, 54, 221, 231, 147, 231, 191, 80, 216, 105]),
#     @name="three"),
#     MyAggregatedModel::DataElement(
#     @_id=
#       BSON::ObjectId(
#       @data=Bytes[96, 24, 37, 54, 221, 231, 147, 231, 191, 80, 216, 104]),
#     @name="three"),
#     MyAggregatedModel::DataElement(
#     @_id=
#       BSON::ObjectId(
#       @data=Bytes[96, 24, 37, 54, 221, 231, 147, 231, 191, 80, 216, 106]),
#     @name="three")]>]

# And to access an element field this time it is much easier.
puts aggregated_models.map &._id.name
# => ["one", "two", "three"]
justmark commented 3 years ago

Thanks for the prompt reply. I was already looking at using the Cryomongo shard after running into this issue.

I'll give your suggestion a spin.

justmark commented 3 years ago

Thanks, appreciate your help.

This is working perfectly.

justmark commented 3 years ago

Hi,

Further to the above, have run into an issue when trying to deserialize and a field isn't present. The model is:

class FivesAggregatedModel
    include BSON::Serializable

    record Id, md : Int32 do
      include BSON::Serializable
    end

    record DataElements, _id : BSON::ObjectId, 
                          charge_phase : Int32, 
      include BSON::Serializable
    end

    property _id : Id
    property data : Array(DataElements)
  end

When running this I end up with the following error:

Unhandled exception: Unable to deserialize key (charge_phase) having value () of type (Nil) belonging to type 'FivesAggregatedModel::DataElements'. Expected type(s) '[Int32]'. (Exception)
  from src/models/fives_aggregated.cr:19:7 in 'initialize'
  from src/models/fives_aggregated.cr:19:7 in 'new'
  from src/models/fives_aggregated.cr:19:7 in 'from_bson'
  from lib/bson/src/bson/ext/array.cr:4:5 in 'from_bson'
  from src/models/fives_aggregated.cr:2:5 in 'initialize'
  from src/models/fives_aggregated.cr:2:5 in 'new'
  from src/models/fives_aggregated.cr:2:5 in 'from_bson'
  from lib/cryomongo/src/cryomongo/cursor.cr:174:7 in 'next'
  from /usr/share/crystal/src/iterator.cr:488:7 in 'to_a'
  from src/analyze.cr:12:38 in 'start'
  from src/app.cr:62:1 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:110:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:96:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:119:3 in 'main'
  from __libc_start_main
  from _start
  from ???

This is only happening when the charge_phase field isn't present in a document (which is a valid situation, data wise). I was going through your BSON project as well to see if there was a way to deal with this condition. Any ideas?

Thanks, Mark

elbywan commented 3 years ago

Hi @justmark,

Could you try marking the field as nilable?

charge_phase : Int32?

justmark commented 3 years ago

@elbywan That did the trick. Sorry I missed that :)