-
Currently, `evaluation.yaml` exists under the `configs/` directory. To start, we wanted to just showcase this recipes as an example, but it is a core part of the finetuning process and therefore shou…
-
It would be nice if we could pre-compute a model's output on a particular dataset, and essentially "cache" this for use in an evaluation. For example, we have a large dataset of long-context documents…
-
Technical:
- [x] List all dimensions being measured
- [Evaluation Dimensions](https://github.com/zhengtxecon/data_tech_hirng/wiki/Evaluation-Dimensions)
- [x] Structure of current query
- [Pro…
-
Hi, based on your guidance, I train my model based on Qwen 1.5-1.8B.
While conducting the evaluation, I noticed that there appear to be some issues with the SQA and MMBench evaluations. The results …
-
### Objectives
Train fine-tuned versions of gLM2:
- [x] Version 0 (v0): Initial fine-tuning and qualitative eval (completed).
- [x] Version 1 (v1): Fine-tuning with augmented training data accoun…
-
Fix the links urls in the MED section (and a few other sections) to pass `check_links` action.
Requirements:
- All links should be formatted in markdown (if possible)
- All URLs should be format…
-
-
We got following issue while running your project for MSU AI club Campus VIsion Challenge:
Custom code for evaluation:
```python
import os
import sys
import matplotlib.pyplot as plt
impor…
-
### Feature Request: MLOps Integration for NLU Section Inspired by MLflow Features
#### Description
To enhance Hexabot's NLU capabilities, we propose integrating MLOps-inspired features to strea…
-
**Problem:**
When I use the local Llama-7b model for faithfulness calculation, the following error occurs. Is it because this metric does not support models with smaller parameters? It only supports …