Optimize token usage when communicating with the OpenAI API

This pull request aims to optimize token usage when communicating with the OpenAI API. Some of the key changes include:

Adding response caching for API queries to avoid duplicate calls for the same query. This will significantly reduce the number of API calls.

Limiting the length of prompts and file contents before passing to the API. Long prompts can use a lot of tokens unnecessarily.

Reducing the number of search results from APIs like Google to limit the response size.

Implementing more efficient text highlighting and trimming of whitespace to reduce unnecessary tokens in the responses.

Optimizing the code embedding process by reducing the chunk size and overlap when splitting code. This ensures code is broken into smaller chunks to limit token usage.

Adding common utility functions to enforce max length limits on content to standardize token reduction across different modules.

The goal of these changes is to reduce redundant and unnecessary token usage to stay within the allocated quota in a sustainable manner. The author has implemented several best practices to optimize the existing codebase for minimal token consumption

codecompanion-ai / code-companion

Optimize token usage when communicating with the OpenAI API #25