aws-samples / amazon-q-business-code-analysis

MIT License
19 stars 11 forks source link

Code Analysis through Pre-Processing with Amazon Q for Business

Introduction

This solution uses Amazon Q for Business to analyze code through pre-processing. The Amazon Q for Business application is first used to pre-process an entire code base by generating documentation about every code file. The generated documentation is then ingested into the Amazon Q application's index. This improves the quality of the answers generated by Amazon Q.

This allows one to understand their code base and generate detailed next steps to improve the codebase or to add new features. This also integrates nicely with Plugins for Amazon Q like JIRA, allowing us to rapidly find opportunities for improvement and immediately create tickets for them.

Deploy the solution

Pre-requisites

You need to have an AWS account and an IAM Role/User with permissions to create and manage the necessary resources and components for this application. (If you do not have an AWS account, please see How do I create and activate a new Amazon Web Services account?)

1. Deploy the stack

We've made this easy by providing pre-built AWS CloudFormation templates that deploy everything you need in your AWS account.

  1. Log into the AWS console if you are not already.
  2. Choose one of the Launch Stack buttons below for your desired AWS region to open the AWS CloudFormation console and create a new stack.
  3. Enter the following parameters:
    1. Stack Name: Name your App, e.g. LANGCHAIN-AGENTS-ANALYSIS.
    2. ProjectName: The project name you want to use, i.e. Langchain-Agents.
    3. GitRepositoryUrl: The URL of the repository you want to analyze, i.e. https://github.com/aws-samples/langchain-agents.git.
    4. IdcArn: The ARN of the Identity Center you want to use to create the Amazon Q for Business application. You can find the ARN under Settings in the AWS Console under IAM Identity Center.
    5. SshSecretName: (Optional) The name of the secret in Secrets Manager that contains the SSH key for the repository. If none just leave this as the default 'None.'
    6. SshUrl: (Optional) The SSH URL of the repository you want to analyze, i.e. git@github.com:aws-samples/langchain-agents.git. If none just leave this as the default 'None.'
Region Easy Deploy Button Template URL - use to upgrade existing stack to a new release
N. Virginia (us-east-1) Launch Stack https://us-east-1-amazon-q-business-code-analysis.s3.amazonaws.com/cloudformation.yml

Note: Depending on the size of the repo and the number of files, the pre-processing job may take anywhere between five minutes to an hour to complete after stack creation. To monitor the progress, you can check the logs under jobs in the Amazon Batch console. If you use the Q for Business application, before pre-processing is completed you may not get the best results.

2. Access the Amazon Q for Business application

  1. Navigate to the Amazon Q for Business application.
  2. Click on the application you just created.
  3. Click on the live web experience and start using the chat interface.

Deploy the solution using CDK

Clone and Install dependencies

Write the following commands in the terminal to get started with the project.

git clone https://github.com/aws-samples/amazon-q-business-code-analysis.git
cd amazon-q-business-code-analysis/cdk
npm install

Deploy the stack using CDK

You can deploy the stack using the following command. Add the following parameters to the command:

  1. ProjectName: The project name you want to use, i.e. Langchain-Agents.
  2. RepositoryUrl: The git URL of the repository you want to analyze, i.e. https://github.com/aws-samples/langchain-agents.git.
  3. IdcArn: The ARN of the Identity Center you want to use to create the Amazon Q for Business application. You can find the ARN under Settings in the AWS Console under IAM Identity Center.

Note, you only need to bootstrap once. If you have already bootstrapped your account, you can skip the bootstrap command.

npx cdk bootstrap --parameters RepositoryUrl=<repository_git_url> --parameters ProjectName=<project_name> --parameters <identity_center_arn> --require-approval never

npx cdk deploy --parameters RepositoryUrl=<repository_git_url> --parameters ProjectName=<project_name> --parameters  IdcArn=<identity_center_arn> --require-approval never

Here is an example of how to deploy the stack with parameters.

npx cdk deploy --parameters RepositoryUrl=https://github.com/aws-samples/langchain-agents.git --parameters ProjectName=Langchain-Agents --require-approval never

2. Access the Amazon Q for Business application

  1. Navigate to the Amazon Q for Business application.
  2. Click on the application you just created.
  3. Click on the live web experience and start using the chat interface.

Architecture

Architecture

Accessing Private repositories

To access a private repository you will need to generate an SSH key and upload the private key to Secrets Manager and the public key to your git provider. Then just pass the ssh url and ssh secret name as parameters. Currently supported with cdk deployments, i.e. For Github you can generate an SSH key by following the instructions here.

npx cdk deploy --parameters ProjectName=Langchain-Agents --parameters RepositoryUrl=https://github.com/aws-samples/langchain-agents.git --parameters ProjectName=Langchain-Agents --parameters SshUrl=git@github.com:aws-samples/langchain-agents.git --parameters SshSecretName=<your_ssh_secret_name> --require-approval never 

Use the Jupyter Notebook

Open the notebook, Generate-and-Ingest-Documentation, and run the cells in order to generate the documentation for the sample repository and store them in the index. If you want to change the repository, you can change the repo_url and ssh_url to specify the repository you want to analyze. Then navigate to the Amazon Q for Business application and ask questions about the repository.

Introduction

Amazon Q for Business is good at using connectors to index data and then allowing you to chat with that data using a managed RAG system. However, as anyone familiar with RAG will know, pulling the most semantically similar data is not always the best way to get the most relevant data. Particularly when dealing with chunks of code. This is where Amazon Q for Business without data pre-processing falls short.

To solve this problem, we take advantage of Amazon Q for Business's ability to return a response from an attached file with a question.

def ask_question_with_attachment(prompt, filename):
    data=open(filename, 'rb')
    answer = amazon_q.chat_sync(
        applicationId=amazon_q_app_id,
        userId=amazon_q_user_id,
        userMessage=prompt,
        attachments=[
            {
                'data': data.read(),
                'name': filename
            },
        ],
    )
    return answer['systemMessage']

This is useful for code analysis as we can send a file to Amazon Q for Business and ask a question about it. This allows us to transform files into high-density data points using natural langauge.

Code Analysis

To demonstrate the power of this function, we will use it to analyze a code file. We will use the following code file as an example:

file_path = "./assets/sample.py"
prompt = "Come up with a list of questions and answers about the attached file. Keep answers dense with information. A good question for a database related file would be 'What is the data flow?' or for a file that executes devops commands 'How is the code being deployed?' or for a file that contains a list of API endpoints 'What are the API endpoints and what do they do?'"
answer = ask_question_with_attachment(prompt, file_path)

We will then take the response from Amazon Q for Business and ingest it into the Amazon Q application's index along with the filename and prompt. This will allow us to pull more relevant information from the index when we ask questions about the code.

def upload_prompt_answer_and_file_name(filename, prompt, answer, repo_url):
    amazon_q.batch_put_document(
        applicationId=amazon_q_app_id,
        indexId=index_id,
        roleArn=role_arn,
        documents=[
            {
                'id': str(uuid.uuid4()),
                'contentType': 'PLAIN_TEXT',
                'title': filename,
                'content':{
                    'blob': f"{filename} | {prompt} | {answer}".encode('utf-8')
                },
                'attributes': [
                    {
                        'name': 'url',
                        'value': {
                            'stringValue': f"{repo_url}{filename}"
                        }
                    }
                ]
            },
        ]
    )
prompt = "Generate comprehensive documentation about the attached file. Make sure you include what dependencies and other files are being referenced as well as function names, class names, and what they do. Keep the answers dense with information."
answer = ask_question_with_attachment(prompt, file_path)
upload_prompt_answer_and_file_name(file_path, prompt, answer, repo_url)

Analyze Entire Repository

To analyze an entire repository, we will use the following function to loop through all the files in the repository and upload the documentation to the Amazon Q application's index.

def generate_documentation_for_repo(repo_url, repo_type, repo_name):
    repo = git.Repo.clone_from(repo_url, f"./{repo_name}")
    for root, dirs, files in os.walk(f"./{repo_name}"):
        for file in files:
            file_path = os.path.join(root, file)
            prompt = "Generate comprehensive documentation about the attached file. Make sure you include what dependencies and other files are being referenced as well as function names, class names, and what they do."
            answer = ask_question_with_attachment(prompt, file_path)
            upload_prompt_answer_and_file_name(file_path, prompt, answer, repo_url)
    shutil.rmtree(f"./{repo_name}")

We can then ask questions about the repository and get detailed answers:

def ask_question_about_repo(prompt, repo_url):
    answer = amazon_q.chat_sync(
        applicationId=amazon_q_app_id,
        userId=amazon_q_user_id,
        userMessage=prompt
    )
    return answer['systemMessage']

You can also navigate directly to your Amazon Q for Business application, click on the live web experience, and start using that chat interface instead.

Conclusion

Whereas Amazon Q for Business alone will not be able to answer complex questions like, 'What is causing high-latency in my application?' After we process all the files we are able to ask these questions. This in turn allows us to ask Q to generate detailed next steps to improve the codebase. This integrates nicely with Plugins for Amazon Q like JIRA, allowing us to rapidly find opportunities for improvement and immediately create tickets for them.

Demo

Amazon Q for Business Web Experience