BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
1.99k stars 211 forks source link

[BUG] `Text` query doesn't work with `fetch_links=True` #606

Open humbertoortuzar opened 1 year ago

humbertoortuzar commented 1 year ago

Describe the bug Beanie @ 1.20.0 fetch_links=True works now with a .count() and .aggregate(...), but... When querying a document with links and a $text statement, the query fails because in the pipeline prefetched lookups are built before the {$match: {$text: ... }}. stage. Or at least when using build_aggregation_pipeline

def build_aggregation_pipeline(self):
    aggregation_pipeline: List[Dict[str, Any]] = construct_lookup_queries(
        self.document_model
    )

    aggregation_pipeline.append({"$match": self.get_filter_query()})

    sort_pipeline = {"$sort": {i[0]: i[1] for i in self.sort_expressions}}
    if sort_pipeline["$sort"]:
        aggregation_pipeline.append(sort_pipeline)
    if self.skip_number != 0:
        aggregation_pipeline.append({"$skip": self.skip_number})
    if self.limit_number != 0:
        aggregation_pipeline.append({"$limit": self.limit_number})
    return aggregation_pipeline

So when you have a .find(Text(search_query), fetch_links=True) you get this error:

image

To Reproduce

class ModelB(Document):
    some_field: str = Field(..., description="A field")
    other_field: Optional[str] = Field(None, description="Another field")

class ModelA(Document):
    name: Indexed(str, pymongo.TEXT) = Field(
        ..., description="Name of the ModelA instance."
    )

    model_b: Optional[Link[ModelB]] = Field(
        None, description="Reference to ModelA."
    )

    is_deleted: StrictBool = Field(False, description="Indicates if it's soft deleted")

query_result = await ModelA.find(Text("some text"), fetch_links=True).to_list()

Expected behavior It should return a result instead of an error. To do this, {$match: {$text: ...}} queries should be included before the construct_lookup_queries(self.document_model)

Additional context If anyone else is facing the same, what I'm doing right now is going inside the package to the function mentioned before and printing the stages needed to fetch your links and adding it to a .aggregate() query. If you also need to do a $match, you can add that stage to your pipeline.

In terms of the current beanie code, I was able to fix my case (multiple filters, fetch_links, and an aggregate pipeline) by doing this:

def build_aggregation_pipeline(self):
    aggregation_pipeline: List[Dict[str, Any]] = []
    filter_query = self.get_filter_query()
    text_queries = [match_case for match_case in filter_query.get("$and", filter_query) if "$text" in match_case]
    other_queries = [match_case for match_case in filter_query.get("$and", filter_query) if "$text" not in match_case]
    aggregation_pipeline.append({"$match": {"$and": text_queries}})
    aggregation_pipeline += construct_lookup_queries(self.document_model)
    aggregation_pipeline.append({"$match": {"$and": other_queries}})

    sort_pipeline = {"$sort": {i[0]: i[1] for i in self.sort_expressions}}
    ...

Disclaimer: This won't work to do $text queries in a linked document, but since $text must be declared first in the pipeline, you'd be better by querying the linked document and using a backlink 👍 .

roman-right commented 1 year ago

Hi @humbertoortuzar , You are right, I'll fix this soon. Thank you very much for the catch!

roman-right commented 1 year ago

This bug is fixed in https://github.com/roman-right/beanie/pull/669. Please try

gsakkis commented 11 months ago

@roman-right build_aggregation_pipeline is also called in FindMany.aggregate but only if fetch_links is True. So this fix for $text does not apply for aggregates with fetch_links=False. The same holds for other FindMany stages ($sort, $skip, $limit). It's not obvious to me why these are not taken into account when fetch_links=False so perhaps there is a more general bug here but I haven't tested it.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 30 days with no activity.

MrEarle commented 10 months ago

I believe this is still an issue. I'm using beanie 1.23.

As far as I managed to understand, I believe this happens because in build_aggregation_pipeline it does this:

        filter_query = self.get_filter_query()
        if "$text" in filter_query:
            ...

But get_filter_query joins all filters under an $and statement, so the filter_query looks like {'$and': [{'$text': ...}]}, resulting in "$text" in filter_query being false.

Update:

I managed to get it to work with a very similar suggestion than what @humbertoortuzar said. I replaced the build_aggregation_pipeline with:

        aggregation_pipeline: list[dict[str, Any]] = construct_lookup_queries(
            self.document_model
        )
        filter_query = self.get_filter_query()

        if filter_query:
            text_queries = [
                match_case
                for match_case in filter_query.get("$and", [filter_query])
                if "$text" in match_case
            ]
            non_text_queries = [
                match_case
                for match_case in filter_query.get("$and", [filter_query])
                if "$text" not in match_case
            ]

            if text_queries:
                aggregation_pipeline.insert(0, {"$match": {"$and": text_queries}})

            if non_text_queries:
                aggregation_pipeline.append({"$match": {"$and": non_text_queries}})

        if extra_stages:
            aggregation_pipeline.extend(extra_stages)

        sort_pipeline = {"$sort": {i[0]: i[1] for i in self.sort_expressions}}
        if sort_pipeline["$sort"]:
            aggregation_pipeline.append(sort_pipeline)
        if self.skip_number != 0:
            aggregation_pipeline.append({"$skip": self.skip_number})
        if self.limit_number != 0:
            aggregation_pipeline.append({"$limit": self.limit_number})

        return aggregation_pipeline
roman-right commented 10 months ago

Thank you. It will be fixed on the next bug-fixing session.

MrEarle commented 8 months ago

I believe this is fixed for aggregate, but I missed that count didn't use the same code in its implementation. I'll draft a PR to fix the count function.

(count still doesn't work with text queries and fetch_links=True)