Codoc-os / django-opensearch-dsl

This is a package that allows indexing of django models in opensearch with opensearch-py.
Other
24 stars 16 forks source link

Issue building KNN query #72

Open anunomac opened 1 week ago

anunomac commented 1 week ago

I am able to build the query and search via opensearch-py but cannot build the query via django-opensearch-dsl.

Example:

Via opensearch-py:

client = OpenSearch(
        hosts=[{"host":host, "port": port}],
        http_auth=auth_cred,
        use_ssl=True,
        verify_certs=False,
        ssl_show_warn=False,
    )
vector=get_embedding("my string")
query={
"query":{"bool":{"must":[],"should":[{"knn":{"content_name_embedding":{"vector":vector,"k":20}}}]}}
}
results = client.search(index=index_name, body=query)

This works and returns hits. Via django-opensearch-dsl, I've tried multiple setups, they all result in something like: opensearchpy.exceptions.UnknownDslObject: DSL class 'knn' does not exist in query. Some of my tries:

s=MyDocument.search().query({"knn": {"content_name_embedding": {"vector": vector, "k": 20}}})
s=MyDocument.search().query(query)
s=MyDocument.search().query(query["query"])

any guidance would be appreciated.

vitaly4uk commented 1 week ago

How did you declare MyDocument class?

anunomac commented 4 days ago

Here's the MyDocument setup:

@registry.register_document
class MyDocument(Document):
    content_name_embedding = DenseVectorField(dimension=384)
    introduction_embedding = DenseVectorField(dimension=384)

    tags = fields.NestedField(properties={
        'name': fields.TextField(),
    })

    class Index:
        name = 'mydoc_index'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
            "index": {"knn": True, "knn.algo_param.ef_search": 100},
        }

    class Django:
            model = MyDoc  

            fields = [
                'external_id',
                'content_name',
                'introduction',
                #'...other fields ...'
            ]

    def prepare_content_name_embedding(self, instance):
        return get_embedding(instance.content_name).tolist()

    def prepare_introduction_embedding(self, instance):
        return get_embedding(instance.introduction).tolist()