Open daveshap opened 7 months ago
Given that the Supreme Oversight Board is the most important decisionmaker in the swarm, it makes sense to really make sure it runs well. Avoiding mistakes and hallucinations is essential here.
Therefore I suggest three parallel Supreme OverSight Boards that run on the same prompts. These boards can have the same archetypes or not. Something like this:
Then what archetype entities are in the SOB is up for discussion. If historical entities they need to be someone with lots of texts written about/by them so it is sufficient to create a genuine persona around.
Running three same models with the same prompt would move the goalpost of hallucinations to the prompt, having the three of them biased in the same way, it would sure capture hallucinations that you'd catch with sampling the seed function, but this approach is limited in the most abstract forms a gaggle of board members would agree to nonsense.
The N prompts should say the same core ideas but worded completely differently, and justified from currents of knowledge as orthogonal between themselves as possible (this is a very difficult task that should be abridged by a professional in history of philosophy or ontology, that writes the guidelines on evaluating the orthogonality between prompts).
That or run N very differently trained models with the same prompt, and have them primed to find errors in their peers.
The SOB should not hallucinate, but also shouldn't have agreeing being a given thing, discussion is a mechanism that:
1- Restrains the expense of resources: Having a resource optimization and utilitarism focused board member in the SOB would help capture problems like these, but having all members biased in the same way may be disastrous. 2- Would capture biases: With all the agents coming from a different current of reasoning, all affirmations should be justified and revised by each other, this would help capture biases that cannot be discarded as hallucination, as all reasoning is subjective and arbitrary affirmations, even when correct. 3- Hallucination mitigation (station): If all agents realize they fundamentally disagree with their beliefs, even if they agree with their ethical goals, they would be more on edge to call out any mistakes made by their peers.
Orthogonal models together cover more space with less effort: "El que mucho abarca, poco aprieta"
When thinking about decisions, it could make sense to make sure that the first input has as much quality as possible as; "low quality input = low quality output".
When thinking about the initial input, it could be cool to have some sort of prompt input structure/format where an initial prompt itself has to fulfill some specific requirements, ensuring that enough qualitative input is provided. The SOB actions therefore could be enhanced to:
1) review the initial prompt 2) evaluate the initial prompt (what could be the assessment criteria ?) 3) improve the initial prompt 4+) proceed with outlined process > All agree > Yes/No > etc…
In a Nutshell: Ensure high quality input first to ensure high quality decisions, add prompt evaluation & improvement process to increase & ensure quality of decisions.
I think the introduction of a refinement of user requests pipeline would be wise to introduce as a pre-formatting step and perhaps as a post-processing step as the board determines directives to assign to agents. I'll leave some links to interesting possible integrations -- I might be able/interested to handle some of them myself, so I'd be interested in both how to claim bounties and approximate $ value of said bounties so I can prioritize appropriately with my other gigs. here's some possibly-useful integrations from open source: https://github.com/Mavenoid/prompt-hyperopt -- gpt-3 era prompt optimizer, hyperparameters https://github.com/EGjoni/DRUGS -- novel noise injection technique, hyperparameters https://github.com/ambroser53/Prompt-Day-Care -- iteration on Promptbreeder (note: having implemented and tested this paper, the gains in prompt effectiveness are highly over-stated. usually it's a few percentage points over modern prompt improvement techniques like CoT and fewshot, but every point counts when you're saving the world :P) The implementation leverages LMQL for a more sophisticated pipeline -- might be signficant when combined with baseline promptbreeder gains. https://github.com/promptfoo/promptfoo -- monitor/track/benchmark prompt performance https://github.com/truera/trulens -- monitoring, base cases for things like bias and offensiveness w/ tools to create custom cases, currently integrating this with another project and it seems useful https://twitter.com/Algomancer/status/1741365237327221000 -- free paper for any researchers, train a small model on preprompt optimizations to include in a training set, then (not sure if it works) use the same model to select from "generate x variations of this prompt. it will be sent to an orchestration model to assign tasks to agents to accomplish a goal."
there are probably more but that's what comes to mind off the top of my head -- perhaps it's wise to split this bounty up into sub-bounties for integrations/add bounties to integrations the community decides are useful?
Directive: Build, test, and document the first instance of a SOB
General ideas: