BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
1.91k stars 201 forks source link

[BUG] Issue with $lookup in aggregation #357

Closed nemrok closed 1 year ago

nemrok commented 1 year ago

Describe the bug First off, I love this framework. I'm looking to move from MongoEngine to Beanie since I use FastApi, AsyncIO and Pydantic for all my projects. I use a lot of aggregations and ran into an issue with $lookup in the pipeline within an aggregate call. It fails to find the linked document. Thanks in advance for looking into this!

To Reproduce

@staticmethod
async def find_artist_counts() -> List[ArtistCount]:
        pipeline = [
            {
                '$lookup': {
                    'from': Artist.get_collection_name(),
                    'localField': 'artist',
                    'foreignField': '_id',
                    'as': 'artist'
                }
            },
            {
                '$group': {
                    '_id': {
                        'artist': '$artist.name',
                        'year': '$year',
                        'genre': '$genre'
                    },
                    'count': {'$sum': 1}
                }
            },
            {
                '$group': {
                    '_id': {
                        'artist': '$_id.artist',
                        'year': '$_id.year'
                    },
                    'genres': {
                        '$push': {
                            'genre': '$_id.genre',
                            'count': '$count'
                        }
                    },
                    'count': {'$sum': '$count'}
                }
            },
            {
                '$group': {
                    '_id': '$_id.artist',
                    'years': {
                        '$push': {
                            'year': '$_id.year',
                            'genres': '$genres',
                            'count': '$count'
                        }
                    },
                    'count': {'$sum': '$count'}
                }
            },
            {
                '$group': {
                    '_id': '$_id',
                    'all_values': {
                        '$push': {
                            'artist': '$_id',
                            'years': '$years',
                            'count': '$count'
                        }
                    }
                }
            }
        ]
        async with MusicDbConnection():
            artist_counts = await Song.aggregate(pipeline, projection_model=ArtistCount).to_list()

            return artist_counts

Expected behavior The $lookup call fails to find the 'Artist' document which is linked to the 'Song' document. Since the Artist document is never found, the aggregate call returns no results. As you can see the pipeline syntax is standard and should work. I have used aggregations like this in Pymongo without issue. Is this a limitation with the current Beanie implementation?

Additional context Here is how the Song and Artist collections are defined:

class Artist(Document):
    name: str
    created_on: datetime
    modified_on: datetime

    class Settings:
        indexes = [
            IndexModel(
                [('name', pymongo.ASCENDING)],
                name='nam'
            ),
            IndexModel(
                [('created_on', pymongo.ASCENDING)],
                name='co'
            ),
            IndexModel(
                [('modified_on', pymongo.ASCENDING)],
                name='mo'
            )
        ]

class Song(Document):
    name: str
    genre: Optional[str] = None
    artist: Link[Artist]
    album: Optional[Link[Album]] = None
    year: int
    released_on: Optional[datetime] = None
    created_on: datetime
    modified_on: datetime

    class Settings:
        indexes = [
            IndexModel(
                [('name', pymongo.ASCENDING), ('album', pymongo.ASCENDING)],
                name='nam_alb'
            ),
            IndexModel(
                [('genre', pymongo.ASCENDING)],
                name='gen'
            ),
            IndexModel(
                [('artist', pymongo.ASCENDING)],
                name='art'
            ),
            IndexModel(
                [('album', pymongo.ASCENDING)],
                name='alb'
            ),
            IndexModel(
                [('year', pymongo.ASCENDING)],
                name='yr'
            ),
            IndexModel(
                [('released_on', pymongo.ASCENDING)],
                name='ro'
            ),
            IndexModel(
                [('created_on', pymongo.ASCENDING)],
                name='co'
            ),
            IndexModel(
                [('modified_on', pymongo.ASCENDING)],
                name='mo'
            )
        ]
nemrok commented 1 year ago

Projection model is as such:

class GenreCount(BaseModel):
    genre: Optional[str] = 'N/A'
    count: int

class YearCount(BaseModel):
    year: int
    genres: List[GenreCount]
    count: int

class ArtistCount(BaseModel):
    artist: Optional[str] = None
    years: Optional[List[YearCount]] = []
    count: Optional[int] = 0
nemrok commented 1 year ago

I solved the issue. It turns out that Song.artist is a DbRef and not an ObjectId. So I updated the pipeline as such and now it's working. I will close the issue. Thanks regardless!

{
                '$lookup': {
                    'from': Artist.get_collection_name(),
                    'localField': 'artist.$id',
                    'foreignField': '_id',
                    'as': 'artist'
                }
            },
            {
                '$unwind': '$artist'
            }
...
}