Closed Softdev1 closed 6 months ago
All the LLM I've tried seem to produce hallucinations for reasons that remain unclear. I've tried various strategies to mitigate this, such as modifying prompts, simplifying contexts, and employing multi-step processes like providing step-by-step code explanations rather than presenting the code outright. Despite these efforts, there appears to be an inherent bias for false positives when a specific issue is proposed for evaluation, affecting the ability to accurately interpret requests that deviate from conventional implementations.
WIP...
are we using gpt4 ? or its gpt3 ? if its gpt4 we could give it a try to other LLMs like the new claude 3 or even llamacode. If its gpt3 we can give it a try to 4 (even if its way lower)
are we using gpt4 ? or its gpt3 ? if its gpt4 we could give it a try to other LLMs like the new claude 3 or even llamacode. If its gpt3 we can give it a try to 4 (even if its way lower)
GPT 3.5 by default, GPT 4 is very slow. But the issue happens also with it. Trying to get access to Claude 3.
Found we can use Claude 3 opus via https://sdk.vercel.ai tho not an API. Probably it's possible to inspect network tab and use it's api for testing
I start to see the light... GPT 3.5 is a lost battle for some cases. Got GPT 4 to work by massively minimizing the context, avoiding to add excessive details about the issue to analize. It's a very bizarre behavior.
Sample case. Asking for unchecked external calls. The IF will be completely ignored.
function withdraw() public returns (bool) {
uint amount = pendingReturns[msg.sender];
if (amount > 0) {
pendingReturns[msg.sender] = 0;
>>>>>>> if (!payable(msg.sender).send(amount)) {
pendingReturns[msg.sender] = amount;
return false;
}
}
return true;
}
❌ claude-3-opus (WTF) ✅ claude-3-sonnet ❌ claude-3-haiku ✅ gpt-4 ❌ gpt-3.5
Will try with some coding-specific LLMs...
aweeesome
❌ claude-3-opus (WTF) ✅ claude-3-sonnet ❌ claude-3-haiku ✅ gpt-4 ❌ gpt-3.5
Will try with some coding-specific LLMs...
Wow claude-3-opus failing but sonnet succeeding is wild
❌ claude-3-opus (WTF) ✅ claude-3-sonnet ❌ claude-3-haiku ✅ gpt-4 ❌ gpt-3.5 Will try with some coding-specific LLMs...
Wow claude-3-opus failing but sonnet succeeding is wild
When increasing temperature it works well, though.
is the demo url working with that to test it out a bit?
is the demo url working with that to test it out a bit?
Yes, deployed with Claude 3 Sonnet right now.
Description
Improve reliability of the outputs to avoid excessive false positives.