Digital publishers are continuously looking for ways to streamline and automate their media workflows to generate and publish new content as rapidly as they can, but without foregoing quality.
Adding images to capture the essence of text can improve the reading experience. Machine learning techniques can help you discover such images. “A striking image is one of the most effective ways to capture audiences' attention and create engagement with your story - but it also has to make sense”.
In this aws-samples project, you see how you can use Amazon Titan foundation models to quickly understand an article and find the best images to accompany it. This time, you generate the embedding directly from the image.
A key concept in semantic search is embeddings. An embedding is a numerical representation of some input—an image, text, or both—in the form of a vector. When you have many vectors, you can measure the distance between them, and vectors that are close in distance are semantically similar or related.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities to help you build generative AI applications, simplifying development while maintaining privacy and security.
Amazon Titan has recently added a new embedding model to its collection, Titan Multimodal Embeddings. This new model can be used for multimodal search, recommendation systems, and other downstream applications.
Multimodal models can understand and analyze data in multiple modalities such as text, image, video, and audio. This latest Amazon Titan model can accept text, images, or both. This means you use the same model to generate embeddings of images and text and use those embeddings to calculate how similar the two are.
These following steps talk through the sequence of actions that enable semantic image and celebrity search.
In this example, you saw how to use Amazon Rekognition, Amazon Comprehend, Amazon Bedrock, and OpenSearch Service to extract metadata from your images and then use ML techniques to discover them automatically using celebrity and semantic search. This is particularly important within the publishing industry, where speed matters in getting fresh content out quickly and to multiple platforms.
As a next step, deploy the solution in your AWS account and upload some of your own images for testing how semantic search can work for you.
SAM cli
The solution uses the SAM CLI for deployment. Make sure to be using latest version of SAM cli
Docker
The solution uses the SAM CLI option to build inside a container to avoid the need for local dependencies. You will need docker available for this.
Node
The front end for this solution is a React web application that can be run locally using Node
npm
The installation of the packages required to run the web application locally, or build it for remote deployment, require npm.
Base Models Access
If you are looking to interact with models from Amazon Bedrock, you need to request access to the base models in one of the regions where Amazon Bedrock is available. Make sure to read and accept models' end-user license agreements or EULA.
Model | Max Token Input | Embedding Dimension | Price for 1K input token | Price for 1K output tokens |
---|---|---|---|---|
Amazon Multimodal Embeddings | 128 | 1,024 (default), 384, 256 | Bedrock pricing | n/a |
Titan Text – Express | 8K | n/a | Bedrock pricing |
You will need to request access to both of the models above.
When we summarize the text in our workflow, we can specify the max output tokens on the Titan Text – Express model, and this ensures that we pass in less than 128 tokens to the embedding model.
The multimodal embedding model also has a max image dimension size of 2048x2048 which we handle as part of the image embedding lambda function.
Note:
This deployment is currently set up to deploy into the us-east-1 region. Please check Amazon Bedrock region availability and update the samconfig.toml file to reflect your desired region.
We recommend deploying with AWS Cloud9. If you'd like to use Cloud9 to deploy the solution, you will need the following before proceeding:
m5.large
as Instance type.Amazon Linux 2
as the platform.You can run these commands from your command line/terminal, or you could use AWS Cloud9.
git clone https://github.com/aws-samples/semantic-image-search-for-articles.git
cd semantic-image-search-for-articles
If you use Cloud9, increase the instance's EBS volume to at least 50GB. To do this, run the following command from the Cloud9 terminal:
bash ./scripts/cloud9-resize.sh 50
See the documentation for more details on environment resize.
Review this file: samconfig.toml
Here you can name your stack, and pick the region you want to deploy in.
region = "us-east-1"
Check if the AWS services are all available in the region you are choosing.
As the deployment will deploy Amazon CloudFront, this can take approximately 20 minutes.
Cloud9 generates STS token's to do the deployment, however, these credentials only last 15 minutes, therefore the token will expire before the deployment is complete, and therefore you won't be able to see the outputs directly from Cloud9.
How to Authenticate with short-term credentials You can export the access key tokens, making sure they last at least 30 minutes or 1800 seconds:
export AWS_ACCESS_KEY_ID= <PASTE_ACCESS_KEY>
export AWS_SECRET_ACCESS_KEY= <PASTE_SECRET_ACCESS_KEY>
export AWS_SESSION_TOKEN= <PASTE_SESSION_TOKEN>
(If the tokens do expire, you can leave the deployment to complete, checking progress within CloudFormation, and then re-run the deployment script below - as the Amazon CloudFront resource will already exist, the deployment will complete quickly)
The deployment of the solution is achieved with the following command:
npm install && npm run deploy
This command will run a series of scripts such as sam build
, sam deploy
and a few others to set up the front end environment with the correct variables.
The authenication is managed by Amazon Cognito. You will need to create a new user to be able to login.
You can find the userpool id from the cloudformation output and choose that userpool and create a new user there to login with.
Once complete, the CLI output will show a value for the CloudFront url to be able to view the web application, e.g. https://d123abc.cloudfront.net/ - you can also see this in the CloudFormation outputs.
The Web App allows the user to upload images to S3 and be indexed by OpenSearch as well as issuing queries to OpenSearch to return the top 10 images that are most semantically related to the article content.
To avoid incurring future charges, delete the resources.
sam delete
from the terminal, or Go to CloudFormation, choose the stack that you deployed via the deploy script mentioned above, and delete the stack. See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.