Minimum context / size of changes

MarkRx commented 2 months ago

We've found that the LLM is more likely to hallucinate or provide "noise" suggestions on small changes (such as one line changes). We suspect it is because the bot has less context to work with so is more likely to "reach for straws".

I suggest configuration option(s) to set the "minimum diff/context/PR" size.

It could be:

A global minimum that is the minimum number of tokens of the entire PR diff (not of the prompt). This would be more precise because it would capture the "size of a line" or
A global minimum that is the minimum number of lines that need to change in the PR. The number of lines may be easier to configure and understand than number of tokens but won't take into effect how big a line is

When the bot skips processing it should provide some kind of feedback to that effect so the end user knows it ran (and didn't crash).

I could also see:

A minimum number of tokens before the pr agent will comment on a specific hunk and offer suggestions or
A minimum number of lines before the pr agent will comment on a specific hunk and offer suggestions

The defaults would be set to 0 for backwards compatibility.

mrT23 commented 2 months ago

The correct way to address this is via prompt adjustments (maybe dedicated for PRs with small changes). Not by a threshold to ignore PRs

The PR that broke the world was small. PR Agent should review all code changes, no matter if they are small or big.

I think that the main cause for this problem (which I am not debating; I am also aware of it) is when the model doesn't have suggestions to give, it fallback to the silliest option, which is to "hallucinate" the PR content.

Also interesting is that it happens both to GPT4 and Claude (I guess you saw it with GPT4. i now have an example where I see it with Claude)

mrT23 commented 2 months ago

This change should prevent/improve those problems

https://github.com/Codium-ai/pr-agent/pull/1170

Codium-ai / pr-agent

Minimum context / size of changes #1144