barseghyanartur / graphene-elastic

Graphene Elasticsearch/OpenSearch (DSL) integration
https://pypi.org/project/graphene-elastic/
71 stars 17 forks source link

"Received incompatible instance" errors with basic setup #66

Closed horatiubota closed 2 years ago

horatiubota commented 2 years ago

Hi,

First off, thanks for the great library!

I have the following setup in a Django app:

My code looks very similar to the readthedocs example (skipping the elasticsearch_dsl document object definition):

import graphene
from graphene_elastic import (
    ElasticsearchObjectType,
    ElasticsearchConnectionField,
)
from graphene_elastic.filter_backends import (
    FilteringFilterBackend,
)

class ESDocument(ElasticsearchObjectType):

    class Meta(object):
        document = CandidateDocument
        interfaces = (graphene.Node,)
        filter_backends = [
            FilteringFilterBackend,
        ]

        filter_fields = {
            "projects": "projects"
        }

class ESQuery(graphene.ObjectType):
    documents = ElasticsearchConnectionField(ESDocument)

schema = Schema(query=ESQuery)

However when I run any GraphQL query using this setup, for example:

query {
  documents {
    edges {
      node {
        projects
      }
    }
  }
}

I get a bunch of incompatible instances:

{
  "errors": [
    {
      "message": "Received incompatible instance \"<Hit(index-test20211213144036777879/kGcVvn0BfPh4jM-pKR9a): {'id': '3bb6cbb88071da2b7f450353331aa3b4', 'projects': ['2f3...}>\"."
    },
...

Any thoughts on why that is the case? Is there anything obviously wrong in this setup?

barseghyanartur commented 2 years ago

Please, show me your document definition.

horatiubota commented 2 years ago

Something like this:

from elasticsearch_dsl import Document
from elasticsearch_dsl import Keyword

class CandidateDocument(Document):
    id = Keyword()
    projects = Keyword(multi=True)

    class Index:
        name = "index-test"
        using = "default-connection"
barseghyanartur commented 2 years ago

@horatiubota:

I think you should do as follows:

from elasticsearch_dsl import Document, Text, Keyword

class CandidateDocument(Document):
    id = Text(
        fields={'raw': Keyword()}
    )
    projects = Text(
        fields={'raw': Keyword(multi=True)}
    )

    class Index:
        name = "index-test"
        using = "default-connection"

See the examples.

horatiubota commented 2 years ago

Thanks! So you can't use the Keyword field directly at the moment? Only through Text(fields={'raw': Keyword()})?

barseghyanartur commented 2 years ago

@horatiubota:

Yep. The reason - keyword is not a text that you want to represent in your API. That's why.

horatiubota commented 2 years ago

Thanks again! Could you please help me understand your answer a bit more? Sorry, I didn't fully get why keyword is not a text that I'd want to represent/expose through the API. For example, wouldn't an id document attribute be best suited for the Keyword field type?

For more context, I'm trying to determine if I can use graphene-elastic for a larger project without changing the existing ES data models (which have a lot of Keyword fields in them). Changing keywords to Text(fields={'raw': Keyword}) would require rewriting several existing ES queries, which is something I'd like to avoid.

barseghyanartur commented 2 years ago

@horatiubota:

First of all, could you confirm that the switch (Keyword -> Text) solves your issue?

horatiubota commented 2 years ago

@barseghyanartur:

After some more debugging, I found that the issue was caused by the CandidateDocument subclassing an intermediate custom class (which in turn subclassed elasticsearch_dsl.Document) instead of elasticsearch_dsl.Document directly. Subclassing directly fixes the original issue - even with using Keyword fields:

"mappings" : {
      "properties" : {
        "id" : {
          "type" : "keyword"
        },
        "projects" : {
          "type" : "keyword"
        }
      }
    }

Document definition:

class CandidateDocument(Document):

    id = Keyword()
    projects = Keyword(multi=True)

    class Index:
        name = "test-index"

Graphene setup:

class ESCandidate(ElasticsearchObjectType):
    class Meta(object):
        document = CandidateDocument
        interfaces = (graphene.Node,)
        filter_backends = [
            FilteringFilterBackend,
        ]

        filter_fields = {"projects": "projects"}

class ESCandidateQuery(graphene.ObjectType):
    documents = ElasticsearchConnectionField(ESCandidate)

schema = graphene.Schema(query=ESCandidateQuery)

If you have a moment to add more detail to the response above regarding Keyword fields, I'd really appreciate your comments.

horatiubota commented 2 years ago

The broader issue was caused by using an index alias. If anyone stumbles across this, please see: https://github.com/elastic/elasticsearch-dsl-py/issues/1125. Specifically, in your elasticsearch_dsl.Document class you need to include this method:

@classmethod
def _matches(cls, hit):
    # override _matches to match indices in a pattern instead of just ALIAS
    # hit is the raw dict as returned by elasticsearch
    return fnmatch(hit["_index"], PATTERN)
barseghyanartur commented 2 years ago

@horatiubota:

Thanks for sharing this!