-
I've been working on evaluating how well LLMs can handle bioimaging tasks relative to the complexity of the task.
First, we can see that different tasks have different probabilities of being easily…
-
## Summary
This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission.
Please fill out this form in its…
-
Hello, is there a way to evaluate an LLM reranker after I finetune it on my own training dataset? Also, how should the test be structured? Same as the training data (e,.g. toy_finetune_data.jsonl)? Th…
-
## Suggest for LLM Evaluation Tutorials with `Evalverse`
- **Tutorials** (Notebook examples): https://github.com/UpstageAI/evalverse/tree/main/examples
- [01_basic_usage.ipynb](https://github.com…
-
Current candidate
Mistral
https://mistral.ai/
RAG - LlamaIndex
https://github.com/mistralai/cookbook/blob/main/llamaindex_agentic_rag.ipynb
Can be tested out locally and seems to be a good ch…
-
When I run `evaluate` with any model of VertexAI, I get several warnings that say
> Gapic client context issue detected.This can occur due to parallelization.
And sometimes the execution of eva…
-
**Describe the bug**
Trying to use an Azure API Key to run a LLM evaluation using UpTrain. I received a 404 error message saying that the deployment is not found. However, there is no deployment name…
-
I need to use locally deployed LLMs for evaluation within my current setup. While setting up LLM monitoring using Phoenix, I require evaluations with the traces, I am only able to find [evaluation llm…
-
**Is your feature request related to a problem? Please describe.**
As of now, Haystack's evaluators which extend LLMEvaluator only support OpenAI. I would like for support through llama.cpp to be add…
-
max_num_samples=-1で評価スクリプトを走らせていたところ、wikicorpus-e-to-jタスクでの評価終了後(間際?)にエラーが出て中断してしまいました。
データ処理などはReadmeに記載されている通りに実行しました。
また、max_num_samples=100でエラーなく完了できていたことを確認しています。
エラーの表記から察するに、BLEUでのcorpusレベルの…