aws-samples / real-time-social-media-analytics-with-generative-ai

MIT No Attribution
14 stars 3 forks source link

Uncover Social Media Insights in Real Time using Amazon Managed Service for Apache Flink and Amazon Bedrock

In this AWS Sample you will deploy an AWS Architecure that is able to combine streaming data with GenAI using Amazon Managed Service for Apache Flink and Amazon Bedrock

Architecture

Social Media Insights with Apache Flink and Amazon Bedrock

Solution Description

  1. User logs in using Amazon Cognito User. Cognito will be used to authenticate all API Gateway Calls
  2. User inputs using the front-end application, hosted locally developed using Streamlit framework, the query terms, API Key and frequency of requests to be made to the X - Twitter API in a Streamlit front end UI. We also provide a front-end of a Kinesis Social Media Application where you can also send messages to be processed in case of not having a Twitter Developer Account
  3. Amazon Managed Service for Apache Flink is used to consume and process the tweets in real time and stores in state the parameters for making the API requests received from the front-end application
  4. The streaming application using Flink’s Asynchronous Async I/O invokes Amazon Titan Embeddings Model hosted by Amazon Bedrock to embed the tweets.
  5. Amazon Bedrock responds with the embeddings of the tweets
  6. The Apache Flink application then writes the embeddings and original text of the message into an Amazon OpenSearch Index. We will be using two separate indexes to separate the messages coming from the X source, and the messages being ingested from the Kinesis Social Media App
  7. User makes questions using the front-end application in Streamlit
  8. An AWS Lambda Function is triggered by Amazon API Gateway with the query from the user.
  9. The Lambda function using LangChain, invokes Amazon Titan Embeddings model to embed the question
  10. Amazon Bedrock returns the embedded question
  11. The Lambda function using LangChain, does a semantic search in Amazon OpenSearch and retrieves the relevant documents related to the question.
  12. Amazon Opensearch returns to the AWS Lambda function, the relevant documents with the vectors for answering the question.
  13. The Lambda Function using LangChain, Prompt Engineering, provides the question and relevant vectors for answering the question to Anthropic Claude hosted on Amazon Bedrock
  14. Amazon Bedrock returns the answer to the question to the Lambda function
  15. The response is delivered to API Gateway
  16. API Gateway provides the response to the questions of the user in the Streamlit application

Getting started

Pre-requisites

To implement this AWS Sample, you need to have the following in your local environment:

You will also need to request GenAI Models access for Amazon Titan Embeddings Model and Anthropic Claude Foundation Models on Amazon Bedrock

Follow the instructions in order to request the models:

  1. Go to the Amazon Bedrock Console, by searching Amazon Bedrock, and click on the service

Amazon Bedrock console

  1. Click Get Started

Amazon Bedrock console

  1. Go to Model Access in the bottom left Menu

Amazon Bedrock model access

  1. Click Manage model access

Amazon Bedrock model access

  1. Select within Amazon a. Titan Embeddings G1 – Text
  2. Select within Anthropic a. Claude 3 Haiku

Claude Instant model

  1. Go to the bottom and click Request model access

Request Claude Instant model

  1. It may take several minutes to get access to the models, but we can continue with the solution deployment

Wait for model availability

Use CDK to create and deploy the solution stack

We use AWS CDK CLI to deploy the Solution. The CDK will deploy the following:

  1. Git Clone the repository
git clone <repo>
  1. CD into repo
cd real-time-social-media-analytics-with-generative-ai
  1. Install libraries
npm install
  1. CD into the Apache Flink Application folder
cd flink-bedrock
  1. Build the Apache Flink Application
mvn clean package
  1. Go back to the root folder of the directory
cd ..

If you want to make any changes to the username and passwords used for Amazon Cognito and Amazon OpenSearch, you can do so by modifying the const at the beginning the cdk stack at lib/real-time-social-media-analytics-gen-ai.ts

  1. Bootstrap your AWS environment
    cdk bootstrap
  2. Deploy the AWS Architecture
    cdk deploy
  3. When asked answer yes to confirm the deployment. The deployment will take around 10 minutes.
  4. After deployment has finished, please go to AWS CloudFormation

AWS Cloudformation

  1. Go to Stacks
  2. Select RealTimeSocialMediaAnalyticsGenAi
  3. Go to Outputs
  4. You will need to copy the StreamlitCommand, which we will use for later

AWS Cloudformation Output

As part of the deployment, two AWS Lambda Functions have been created and deployed in order to start the Managed Flink Application and create the two indexes in OpenSearch The two indexes are:

PUT /twitter-custom-rag-index
{
  "mappings": {
    "properties": {
      "embeddings": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      },
      "@timestamp": {
        "type": "date"
      },
      "text": {
        "type": "text"
      }
    }
  },
  "settings": {
    "index": {
      "knn": true,
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  }
}
PUT /twitter-rag-index
{
  "aliases": {},
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "embeddings": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": {
          "engine": "nmslib",
          "space_type": "l2",
          "name": "hnsw",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      },
      "impression_count": {
        "type": "integer"
      },
      "likes": {
        "type": "integer"
      },
      "retweet_count": {
        "type": "integer"
      },
      "tweet": {
        "type": "text"
      }
    }
  },
  "settings": {
    "index": {
      "knn": true,
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  }
}

With these steps we have configured the Data Ingestion part of the solution, we will now continue in setting up the User Interface.

Streamlit Application

  1. In the project repository, cd into the streamlit folder
cd streamlit

In that folder you will find a Multi-Page Streamlit Application

It follows this pattern

streamlit run Bedrock_Chatbot.py --theme.base "dark" -- --pool_id <pool_id> --app_client_id <app_client_id> --app_client_secret <app_client_secret>  --bedrockApi <value> --rulesApi <value> --kinesisAPI <value>

This will open a tab in your browser. This is how we will be able to interact with our Flink Application and make questions related to the processed messages

  1. You will need to authenticate with the Cognito user that has been created during the CDK Deployment

If you did not make any change to the CDK, the Cognito user created for you is:

You can create additional users in Amazon Cognito and delete this one if needed

Cognito-auth

  1. If you are going to be using the Twitter API you will need to provide the following:

Streamlit application configuration

  1. Once you have entered those configurations, we are ready to start making questions related to Twitter Feed. Please check the Use Twitter Index Box. If we don’t, we will be making questions to the Kinesis Social Media messages

For example, we will make the following questions

• Based on the tweets, what are the main topics people are discussing?

Streamlit application

• Which are the most asked product feature requests in AWS?

Streamlit application

  1. If you do not plan to use the Twitter API, you send messages using the My Social Media. For example:

Streamlit application

If we go back to Bedrock Chatbot, we can make questions on messages we sent to My Social Media. Note “Use Twitter Index” box must be unchecked.

• Is there any tweet mentioning Mistral?

Streamlit application

Feel free to continue sending messages to the Kinesis Social Media, or pulling data from X, however beware you don’t know over your monthly threshold of the API requests

Clean up

To delete all created stack resources you can run

cdk destroy --all

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Disclaimer

This sample is not meant for deployment in production, and some security mechanisms, such as logging, have been disabled for cost optimization. However, if needed we encourage you to leverage CDK Nag in order to identify and implement best practices for all the services deployed in the sample.

In this sample, if you are going to consume data from Twitter using a developer account, you will be interacting with public data coming from the feed. This sample takes no responsibility of the comments or opinions that could be consumed from the social media application