-
**My observation**
- With https://github.com/ys-zong/VLGuard/blob/main/VLGuard_eval.py, I am able to reproduce results not too far from Table 2 for **VLGuard dataset**.
- However, **I cannot reprodu…
-
I create a subclass of baseragassembeddings. because I already have all the embeddings for context, query, and question. I did this to not use the openai API key. because it is costly and also I want …
-
Hello, is there a way to evaluate an LLM reranker after I finetune it on my own training dataset? Also, how should the test be structured? Same as the training data (e,.g. toy_finetune_data.jsonl)? Th…
-
Using openAI api might not be feasible for some environment, could someone provide any reference or example link on running evaluation with local LLM model/?
-
[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug.
**Describe the bug**
I cannot evaluate open source gemma:2b model using ragas. I…
-
### Dataset name
Schibsted text tasks
### Dataset link
https://huggingface.co/collections/Schibsted/schibsted-text-tasks-66655bce94d0f40432519347
### Dataset languages
- [ ] Danish
- [X] Swedish
…
-
### Bug Description
guideline evaluation is throwing error saying missing model
### Version
latest
### Steps to Reproduce
run the guideline evaluation example
### Relevant Logs/Tracbacks
```sh…
-
### paper `AI for low-code for AI`
**Research Contribution**
- this paper argues that visual programming component compensates for the unambiguity caused by the natural language programming compo…
-
例如以下例子中 \*\*Score: 9\*\* 被错误解析为 5.0
```python
{
"question": "To cook perfectly golden pancakes,",
"obj": {
"generation_a": "Mix the ingredients together in a bowl and pour it onto…
-
### Search before asking
- [X] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues.
### Component
Other
### Feature
LLM-based agentic workflows are e…