aws-samples / semantic-image-search-for-articles

How you can add semantic search to your applications. This sample shows how you can use a multimodal model to find images which are semantically similar to some text. New blog coming out soon.
https://aws.amazon.com/blogs/machine-learning/semantic-image-search-for-articles-using-amazon-rekognition-amazon-sagemaker-foundation-models-and-amazon-opensearch-service/
MIT No Attribution
6 stars 1 forks source link
aws generative-ai multimodal search semantic vector vector-search

Semantic image search using Amazon Titan Multimodal Embeddings model

Digital publishers are continuously looking for ways to streamline and automate their media workflows to generate and publish new content as rapidly as they can, but without foregoing quality.

Adding images to capture the essence of text can improve the reading experience. Machine learning techniques can help you discover such images. “A striking image is one of the most effective ways to capture audiences' attention and create engagement with your story - but it also has to make sense”.

In this aws-samples project, you see how you can use Amazon Titan foundation models to quickly understand an article and find the best images to accompany it. This time, you generate the embedding directly from the image.

A key concept in semantic search is embeddings. An embedding is a numerical representation of some input—an image, text, or both—in the form of a vector. When you have many vectors, you can measure the distance between them, and vectors that are close in distance are semantically similar or related.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities to help you build generative AI applications, simplifying development while maintaining privacy and security.

Amazon Titan has recently added a new embedding model to its collection, Titan Multimodal Embeddings. This new model can be used for multimodal search, recommendation systems, and other downstream applications.

Multimodal models can understand and analyze data in multiple modalities such as text, image, video, and audio. This latest Amazon Titan model can accept text, images, or both. This means you use the same model to generate embeddings of images and text and use those embeddings to calculate how similar the two are.

sample

Deploying the full stack application

Architecture diagram - Semantic Image search

These following steps talk through the sequence of actions that enable semantic image and celebrity search.

  1. You upload an image to an Amazon S3 bucket
  2. Amazon EventBridge listens to this event, and then triggers an AWS Step function execution
  3. The Step Function takes the Amazon S3 image details and runs 3 parallel actions
  4. API call to Amazon Rekognition DetectLabels to extract object metadata
  5. API call to Amazon Rekognition RecognizeCelebrities APIs to extract any known celebrities
  6. AWS Lambda resizes the image to accepted max dimensions for the ML embedding model and generates an embedding direct from the image input
  7. The Lambda function then inserts the image object metadata and celebrity name(s) if present, and the embedding as a k-NN vector into an OpenSearch Service index
  8. Amazon S3 hosts a simple static website, distributed by an Amazon CloudFront. The front-end user interface (UI) allows you to authenticate with the application using Amazon Cognito to search for images
  9. You submit an article or some text via the UI
  10. Another Lambda function calls Amazon Comprehend to detect any names in the text as potential celebrities
  11. The function then summarizes the text to get the pertinent points from the article Using Titan Text G1 - Express
  12. The function generates an embedding of the summarized article using the Titan multimodal model.
  13. The function then searches OpenSearch Service image index for images matching the celebrity name and the k-nearest neighbors for the vector using cosine similarity, using Exact k-NN with scoring script.
  14. Amazon CloudWatch and AWS X-Ray give you observability into the end-to-end workflow to alert you of any issues.

Conclusion

In this example, you saw how to use Amazon Rekognition, Amazon Comprehend, Amazon Bedrock, and OpenSearch Service to extract metadata from your images and then use ML techniques to discover them automatically using celebrity and semantic search. This is particularly important within the publishing industry, where speed matters in getting fresh content out quickly and to multiple platforms.

As a next step, deploy the solution in your AWS account and upload some of your own images for testing how semantic search can work for you.

Deploy steps

Pre-requisites

Amazon Bedrock requirements

Base Models Access

If you are looking to interact with models from Amazon Bedrock, you need to request access to the base models in one of the regions where Amazon Bedrock is available. Make sure to read and accept models' end-user license agreements or EULA.

Model Max Token Input Embedding Dimension Price for 1K input token Price for 1K output tokens
Amazon Multimodal Embeddings 128 1,024 (default), 384, 256 Bedrock pricing n/a
Titan Text – Express 8K n/a Bedrock pricing

You will need to request access to both of the models above.

When we summarize the text in our workflow, we can specify the max output tokens on the Titan Text – Express model, and this ensures that we pass in less than 128 tokens to the embedding model.

The multimodal embedding model also has a max image dimension size of 2048x2048 which we handle as part of the image embedding lambda function.

Note:

Deployment

This deployment is currently set up to deploy into the us-east-1 region. Please check Amazon Bedrock region availability and update the samconfig.toml file to reflect your desired region.

Environment setup

Deploy with AWS Cloud9

We recommend deploying with AWS Cloud9. If you'd like to use Cloud9 to deploy the solution, you will need the following before proceeding:

You can run these commands from your command line/terminal, or you could use AWS Cloud9.

  1. Clone the repository
git clone https://github.com/aws-samples/semantic-image-search-for-articles.git
  1. Move into the cloned repository
    cd semantic-image-search-for-articles

(Optional) Only for Cloud9

If you use Cloud9, increase the instance's EBS volume to at least 50GB. To do this, run the following command from the Cloud9 terminal:

bash ./scripts/cloud9-resize.sh 50

See the documentation for more details on environment resize.

Review this file: samconfig.toml

Here you can name your stack, and pick the region you want to deploy in.

Check if the AWS services are all available in the region you are choosing.

As the deployment will deploy Amazon CloudFront, this can take approximately 20 minutes.

Cloud9 generates STS token's to do the deployment, however, these credentials only last 15 minutes, therefore the token will expire before the deployment is complete, and therefore you won't be able to see the outputs directly from Cloud9.

How to Authenticate with short-term credentials You can export the access key tokens, making sure they last at least 30 minutes or 1800 seconds:

export AWS_ACCESS_KEY_ID= <PASTE_ACCESS_KEY>
export AWS_SECRET_ACCESS_KEY= <PASTE_SECRET_ACCESS_KEY>
export AWS_SESSION_TOKEN= <PASTE_SESSION_TOKEN>

(If the tokens do expire, you can leave the deployment to complete, checking progress within CloudFormation, and then re-run the deployment script below - as the Amazon CloudFront resource will already exist, the deployment will complete quickly)

Run the deployment of the application

The deployment of the solution is achieved with the following command:

npm install && npm run deploy

This command will run a series of scripts such as sam build, sam deploy and a few others to set up the front end environment with the correct variables.

Cloud9 Deployment complete

Create login details for the web application

The authenication is managed by Amazon Cognito. You will need to create a new user to be able to login.

You can find the userpool id from the cloudformation output and choose that userpool and create a new user there to login with.

Amazon Cognito - User creation

Login to your new web application

Once complete, the CLI output will show a value for the CloudFront url to be able to view the web application, e.g. https://d123abc.cloudfront.net/ - you can also see this in the CloudFormation outputs.

Administration

The Web App allows the user to upload images to S3 and be indexed by OpenSearch as well as issuing queries to OpenSearch to return the top 10 images that are most semantically related to the article content.

Cleaning up

To avoid incurring future charges, delete the resources.

  1. Find the S3 bucket deployed with this solution and empty the bucket
  2. Run sam delete from the terminal, or Go to CloudFormation, choose the stack that you deployed via the deploy script mentioned above, and delete the stack.

Amazon CloudFormation stacks

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.