aws-samples / semantic-image-search-for-articles

How you can add semantic search to your applications. This sample shows how you can use a multimodal model to find images which are semantically similar to some text. New blog coming out soon.
https://aws.amazon.com/blogs/machine-learning/semantic-image-search-for-articles-using-amazon-rekognition-amazon-sagemaker-foundation-models-and-amazon-opensearch-service/
MIT No Attribution
6 stars 1 forks source link

Creating open search index on every request #5

Open lehigh123 opened 6 months ago

lehigh123 commented 6 months ago

In the get_put lambda, it seems that the open search index is created on every request via:

def index_document(document):
    # Create Index, generates a warning if index already exists
    wr.opensearch.create_index(
        client=os_client,
        index="images",
        settings={
            "index.knn": True,
            "index.knn.space_type": "cosinesimil",
            "analysis": {
                "analyzer": {"default": {"type": "standard", "stopwords": "_english_"}}
            },
        },
        mappings={
            "properties": {
                "image_vector": {
                    "type": "knn_vector",
                    "dimension": len(document["image_vector"]),
                    "store": True,
                },
                "image_path": {"type": "text", "store": True},
                "image_words": {"type": "text", "store": True},
                "celebrities": {"type": "text", "store": True},
            }
        },
    )

This seems like anti-pattern and a potential waste of resources (and probably bad to emit a warning on every request). If setting this up in production what is the recommended way of creating the index only a single time?

danjhd commented 6 months ago

You are correct, it attempts to create an index each time, if the index already exists it logs the warning but takes no further action (the index is not re-created and the existing index is left untouched) this is done to make it easier for the sample this solution is. You can comment out this section of the code to prevent this if you have certainty that the index is already in place in your cluster.