Feat: Condense Event History

SmartManoj commented 4 months ago

Fix https://github.com/OpenDevin/OpenDevin/issues/1748 Adding https://github.com/OpenDevin/OpenDevin/pull/2937.

SmartManoj commented 4 months ago

@CodiumAI-Agent /review

CodiumAI-Agent commented 4 months ago

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪

🧪 No relevant tests

🔒 No security concerns identified

⚡ Key issues to review

Missing Handling
The `parse` method in `action_parser.py` now directly returns an `AgentSummarizeAction` if the response is of that type, but does not handle other potential actions that could be derived from `AgentSummarizeAction`. This might lead to unhandled cases if new subtypes of `AgentSummarizeAction` are introduced. String Concatenation
In the method `action_to_str`, string concatenation using '+' could lead to `TypeError` if the attributes like `summarized_actions` or `summarized_observations` are not strings. This could be mitigated by ensuring these attributes are always strings or using a different method of concatenation. Error Handling
The `condense` method raises `SummarizeError` if no messages are condensable or if there's only one message to summarize. However, it does not handle what should happen next, potentially leaving the system in an inconsistent state. Token Limit Check
The method `is_over_token_limit` checks if the token count exceeds the limit but includes a padding which might not be clear why it's needed. The use of padding should be justified or documented to avoid confusion.

SmartManoj commented 4 months ago

@CodiumAI-Agent /ask

Give suggestions.

CodiumAI-Agent commented 4 months ago

Ask❓

Give suggestions.

Answer:

Here are some suggestions for the changes introduced in this PR:

Documentation and Comments:
- Ensure that all new methods and classes have comprehensive docstrings explaining their purpose, parameters, and return values. For example, the CondenserMixin class and its methods could benefit from more detailed explanations.
- Add comments to complex logic sections to improve readability and maintainability. For instance, the loop in the condense method of CondenserMixin could use comments explaining the purpose of each step.
Error Handling:
- In the step method of agenthub/codeact_agent/codeact_agent.py, consider logging the exception details when catching ContextWindowLimitExceededError to aid in debugging.
- Ensure that all custom exceptions in opendevin/core/exceptions.py have meaningful messages that provide context about the error.
Code Consistency:
- Ensure consistent use of single and double quotes throughout the codebase. For example, in opendevin/condenser/condenser.py, both single and double quotes are used interchangeably.
- Maintain consistent formatting for method definitions and calls. For instance, in opendevin/llm/llm.py, the method get_token_count has inconsistent spacing around the parameters.
Performance Considerations:
- In the condense method of CondenserMixin, consider optimizing the token counting logic to avoid recalculating token counts for the same messages multiple times.
- Review the retry logic in opendevin/llm/llm.py to ensure it does not lead to excessive retries, which could impact performance.
Testing:
- Ensure that new functionality, especially the summarization and condensing logic, is thoroughly tested. Add unit tests to cover various scenarios, including edge cases.
- Consider adding integration tests to verify that the summarization and condensing work correctly within the broader system.
Logging:
- Enhance logging to provide more context about the operations being performed. For example, in opendevin/condenser/condenser.py, log the number of tokens before and after summarization.
- Ensure that sensitive information is not logged, especially when dealing with user inputs or model outputs.
Code Cleanup:
- Remove commented-out code unless it serves a clear purpose. For example, the commented-out code in opendevin/condenser/condenser.py related to preserving the last N messages should be either implemented or removed.
- Ensure that all TODO comments have corresponding tasks or issues created to track their completion.

By addressing these suggestions, the codebase will be more maintainable, performant, and easier to understand for future developers.

SmartManoj commented 4 months ago

@CodiumAI-Agent /ask

Missing Handling The parse method in action_parser.py now directly returns an AgentSummarizeAction if the response is of that type, but does not handle other potential actions that could be derived from AgentSummarizeAction. This might lead to unhandled cases if new subtypes of AgentSummarizeAction are introduced.

It will be handled if new subtypes are introduced. Remember Simple is better than complex.

String Concatenation In the method action_to_str, string concatenation using '+' could lead to TypeError if the attributes like summarized_actions or summarized_observations are not strings. This could be mitigated by ensuring these attributes are always strings or using a different method of concatenation.

Both types are strings and remember mypy is happy.

Error Handling The condense method raises SummarizeError if no messages are condensable or if there's only one message to summarize. However, it does not handle what should happen next, potentially leaving the system in an inconsistent state.

It may not be raised.

Token Limit Check The method is_over_token_limit checks if the token count exceeds the limit but includes a padding which might not be clear why it's needed. The use of padding should be justified or documented to avoid confusion.

Yep, Quoting from Previous PR estimation of tokens to add to the prompt for the user-configured max token count

CodiumAI-Agent commented 4 months ago

Ask❓

[Missing Handling](https://github.com/SmartManoj/Kevin/pull/21/files > The parse method in action_parser.py now directly returns an AgentSummarizeAction if the response is of that type, but does not handle other potential actions that could be derived from AgentSummarizeAction. This might lead to unhandled cases if new subtypes of AgentSummarizeAction are introduced. It will be handled if new subtypes are introduced. Remember Simple is better than complex. > [String Concatenation](https://github.com/SmartManoj/Kevin/pull/21/files > In the method action_to_str, string concatenation using '+' could lead to TypeError if the attributes like summarized_actions or summarized_observations are not strings. This could be mitigated by ensuring these attributes are always strings or using a different method of concatenation. Both types are strings and remember mypy is happy. > [Error Handling](https://github.com/SmartManoj/Kevin/pull/21/files > The condense method raises SummarizeError if no messages are condensable or if there's only one message to summarize. However, it does not handle what should happen next, potentially leaving the system in an inconsistent state. It may not be raised. > [Token Limit Check](https://github.com/SmartManoj/Kevin/pull/21/files > The method is_over_token_limit checks if the token count exceeds the limit but includes a padding which might not be clear why it's needed. The use of padding should be justified or documented to avoid confusion. Yep, Quoting [from Previous PR](https://github.com/OpenDevin/OpenDevin/pull/2021/files

Answer:

Missing Handling

The parse method in action_parser.py now directly returns an AgentSummarizeAction if the response is of that type. While this is a straightforward approach, it does not account for potential future subtypes of AgentSummarizeAction. If new subtypes are introduced, they might not be handled correctly. To future-proof this, you could use a more flexible approach, such as:

def parse(self, response) -> Action:
    if isinstance(response, AgentSummarizeAction):
        return response
    # Handle other potential subtypes here
    action_str = self.parse_response(response)
    return self.parse_action(action_str)

This ensures that any new subtypes of AgentSummarizeAction will be handled appropriately.

String Concatenation

In the action_to_str method, string concatenation using + could lead to TypeError if summarized_actions or summarized_observations are not strings. Although you mentioned that these attributes are always strings and mypy is happy, it is still good practice to ensure type safety. You can use f-strings or the str() function to ensure that the attributes are treated as strings:

elif isinstance(action, AgentSummarizeAction):
    return (
        f'Summary of all Action and Observations till now. \n'
        f'Action: {str(action.summarized_actions)}\n'
        f'Observation: {str(action.summarized_observations)}'
    )

Error Handling

The condense method raises SummarizeError if no messages are condensable or if there's only one message to summarize. However, it does not specify what should happen next, potentially leaving the system in an inconsistent state. It would be beneficial to handle this exception to ensure the system remains consistent:

def condense(self, messages: list[Message]):
    try:
        # Existing condense logic
    except SummarizeError as e:
        logger.error(f'Condensation failed: {str(e)}')
        # Handle the error, e.g., by returning a default action or re-trying with different parameters
        return None

Token Limit Check

The is_over_token_limit method includes a padding of 512 tokens, which might not be immediately clear. Documenting the reason for this padding would help avoid confusion. For example:

def is_over_token_limit(self, messages: list[Message]) -> bool:
    """
    Estimates the token count of the given events using litellm tokenizer and returns True if over the max_input_tokens value.

    Parameters:
    - messages: List of messages to estimate the token count for.

    Returns:
    - Estimated token count.

    Note:
    - A padding of 512 tokens is added to account for potential discrepancies in token counting.
    """
    MAX_TOKEN_COUNT_PADDING = 512
    if not self.config.max_input_tokens:
        return False
    token_count = self.get_token_count(messages=messages) + MAX_TOKEN_COUNT_PADDING
    return token_count >= self.config.max_input_tokens

This documentation clarifies the purpose of the padding and helps maintainers understand its necessity.

SmartManoj commented 4 months ago

@CodiumAI-Agent /ask

    # Handle other potential subtypes here
    action_str = self.parse_response(response)
    return self.parse_action(action_str)

Other subtypes of Action are already handled after that. What will be the new subtypes of AgentSummarizeAction? Even if it so, it will be inside the if branch.

f'Summary of all Action and Observations till now. \n'
        f'Action: {str(action.summarized_actions)}\n'

f is not needed in first. no need to use str inside f-strings

However, it does not specify what should happen next, potentially leaving the system in an inconsistent state. It would be beneficial to handle this exception to ensure the system remains consistent:

need to check why, where, and when it will fail.

CodiumAI-Agent commented 4 months ago

Ask❓

python > > action_str = self.parse_response(response) > return self.parse_action(action_str) > Other subtypes of Action are already handled after that. What will be the new subtypes of AgentSummarizeAction? Even if it so, it will be inside the if branch. > python > f'Summary of all Action and Observations till now. n' > f'Action: {str(action.summarized_actions)}n' > f is not needed in first. no need to use str inside f-strings > However, it does not specify what should happen next, potentially leaving the system in an inconsistent state. It would be beneficial to handle this exception to ensure the system remains consistent: need to check why, where, and when it will fail.

Answer:

action_str = self.parse_response(response)
return self.parse_action(action_str)
Other subtypes of Action are already handled after that. What will be the new subtypes of AgentSummarizeAction? Even if it so, it will be inside the if branch.

The new subtype introduced in this PR is AgentSummarizeAction. This action is specifically handled in the if branch before the parse_response method is called. If the response is an instance of AgentSummarizeAction, it is returned directly without further parsing. This ensures that the summarization action is processed correctly and not treated as a generic action string.

f'Summary of all Action and Observations till now. n'
f'Action: {str(action.summarized_actions)}n'
f is not needed in first. no need to use str inside f-strings

You are correct. The first f is unnecessary, and using str inside f-strings is redundant. The corrected code should be:

return (
    'Summary of all Action and Observations till now. \n'
    + 'Action: '
    + action.summarized_actions
    + '\nObservation: '
    + action.summarized_observations
)

However, it does not specify what should happen next, potentially leaving the system in an inconsistent state. It would be beneficial to handle this exception to ensure the system remains consistent: need to check why, where, and when it will fail.

To ensure the system remains consistent, it is crucial to handle exceptions that might arise during the summarization process. The SummarizeError and ContextWindowLimitExceededError exceptions are introduced to handle specific failure scenarios. Here's an example of how to handle these exceptions:

SummarizeError: This exception is raised when there are not enough messages to summarize. It should be caught and logged, and the system should proceed with the next available action or observation.
ContextWindowLimitExceededError: This exception is raised when the context window limit is exceeded, and memory cannot be condensed further. This should be caught, logged, and potentially trigger a fallback mechanism, such as notifying the user or attempting a different strategy.

Example of handling these exceptions:

try:
    response = self.llm.completion(
        messages=[message.model_dump() for message in messages],
        stop=[
            '</execute_ipython>',
            '</execute_bash>',
            '</execute_browse>',
        ],
        temperature=0.0,
        condense=True,
    )
except SummarizeError as e:
    logger.error(f'Summarization failed: {str(e)}')
    # Handle the error, e.g., by notifying the user or taking a fallback action
except ContextWindowLimitExceededError as e:
    logger.error(f'Context window limit exceeded: {str(e)}')
    # Handle the error, e.g., by notifying the user or taking a fallback action

By handling these exceptions, the system can maintain consistency and provide a better user experience even in failure scenarios.

SmartManoj commented 4 months ago

Applied the suggestions in https://github.com/SmartManoj/Kevin/commit/7abe952853b6bfd958f2f817376aba3a900bfea3

SmartManoj / Kevin