Heuristic proxy for confidence in agent's predictions

gnosis / prediction-market-agent

GNU Lesser General Public License v3.0

24 stars 4 forks source link

Heuristic proxy for confidence in agent's predictions #477

Open gabrielfior opened 3 days ago

gabrielfior commented 3 days ago

Based on @kongzii suggestion:

-> Divide all agent's predictions into probabilty buckets (deciles), e.g. if an agent gives 65% probability to a market, it goes in the 7th decile. -> For each decile, we roughly expect the accuracy of it to be equal their decile - i.e., the 7th decile above (60-70%) should have an accuracy of roughly 60-70%. -> Using the correlation between decile accuracy vs actual accuracy, we can draw a value for the confidence -> It would also be interesting to use the metrics above to quantify an associated error.

kongzii commented 2 days ago

-> we can draw a value for the confidence

How do you mean?

evangriffiths commented 2 days ago

I understood this analysis to be one way of understanding 'how accurate are the p_yes predictions of an agent?', not that it should be used to generate a confidence score for a given prediction. Maybe this info could be given to the agent when asking it to generate a confidence score, but I think it still needs to be decided on a per-prediction basis.

gabrielfior commented 2 days ago

I understood this analysis to be one way of understanding 'how accurate are the p_yes predictions of an agent?'

That's also my understanding

The question (for this ticket) remains open - how should we define confidence for the agent? Still ask the agent for it, or define using hardcoded rules?

gabrielfior commented 2 days ago

Some additional observations -> From @evangriffiths : "I remember back in the beginning we used the PMAT Benchmark class to generate a bunch of predictions and confidence scores, and we saw that the LLM gave pretty rubbish scores - there was like no correlation between 'confidence' and 'abs difference between estimate_p_yes and manifold/polymarket p_yes'. So we can definitely do better, but it's not obvious how. " -> From @kongzii : "Another LLM doing the confidence based on research and probability from the first LLM ?" -> From @gabrielfior - Let's mark this as low priority since we don't have a great idea on how to improve the current status quo.