-
I am trying to evaluate llm4decompile-6.7b-v1.5 using the methods you provided. The model weights were downloaded from the Hugging Face repository of the same name. However, I keep encountering an err…
-
- [ ] Shortlist metrics
- [ ] Shortlist datasets - eval scripts available
- [ ] Write Evaluation script
-
### Description
Overall intent is to ensure the Ruby Agent HTTP instrumentation works properly with the HTTP 2 protocol.
As part of this research spike:
- [ ] Catalog all the instrumentation re…
-
Purpose: Define how T3 participants will be evaluated during and after the training.
How to get started:
1. Review FL exam criteria as reference
2. List key competencies T3 participants must demo…
-
We need an evaluation framework in chapter 10 that goes from actions to objectives to outcomes to impacts to high-level changes, and includes how this will be measured (metrics). The high-level chang…
-
Create an automated process to run an evaluation on the training data and record the results to a *"database"*.
- [ ] Create N partitions from training data.
- [ ] Run N evaluations
- [ ] Write r…
-
Hello,
Does R2R support multi-agent collaboration, or any recommendatuon wit integrate an external framework? Im evaluation atm to buold a system where agents collaborate while leveraging R2R’s ing…
-
I took Eigen 3.4's tensor module for a quick spin to see what it might look like to support batch evaluation in the systems framework. Short story: I'm highly encouraged by it, but I think Eigen alon…
-
I'm opening this issue to discuss about what we think the "LLM task" framework should aim to be, and how we could incrementally get there.
## What we have today
Today, what we call the "task framewo…
-
What is the best way to convert a weave dataset to a JSON or a Python sequential type?
At the moment, I use the following snippet as a template:
```python
import json
def export_dataset_as_json(re…