-
https://ukgovernmentbeis.github.io/inspect_ai/
via [Hamel post](https://x.com/hamelhusain/status/1788936691576975382?s=46&t=aOEVGBVv9ICQLUYL4fQHlQ)
-
As we begin to evaluate LLM assisted root cause analysis, we need a way to be able to evaluate the validity and usefulness of the results.
Historically, our process for evaluating these results has …
-
## Description
Enhance the `TodoDetails` page within the dashboard by introducing a new `Evaluation` component. This component will display data generated from `evaluateIssue.ts`, providing users wit…
-
We should see if there is a good way to evaluate the chess engine.
-
## Description
The evaluation component within the todo section of the dashboard currently provides the necessary functionality but lacks visual polish. To improve user engagement and make the compon…
-
### Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
- [X] I have checked the existing issues…
-
### Documentation Issue Description
I encountered an error when trying to use RetryGuidelineQueryEngine in my code with the keyword argument base_query_engine. The error message is as follows:
__ini…
-
### Scenario
It is important to add **observability** to an AI bot built with teams-ai SDK since the AI may have non-deterministic behaviors. Currently it is quite hard to evaluate against an AI bot …
-
### Task 3: Define a falsifiable, measurable hypothesis.
> Our first hypothesis questions the validity of using an AI model for querying a database
> at all, and whether an LLM can effectively retrie…
-
Anytime I play against a computer it gets stuck in the "thinking" state and the page freezes.
Need to fix to allow the computer to correctly calculate and make it's move.
We also need to correc…