-
**Describe the bug**
I encounter 'ValueError: Evaluation LLM outputted an invalid JSON. Please use a better evaluation model.' while using most popular open source chat models in DeepEval framework. …
-
Hello!
This is very nice work!
I kindly want to know how to do the automatic safety evaluations. According to the paper, you use a safety critique llm for the evaluations. Will you release the…
-
按照教程走没有发现不一样地方,运行训练代码报错、;
`(langgpt) root@intern-studio-50152060:~/InternLM/XTuner# xtuner train ./internlm2_chat_1_8b_qlora_alpaca_e3_copy.py
/root/.conda/envs/langgpt/lib/python3.10/site-packages/…
-
Issue is to track evaluation of RAG implementations.
Frameworks:
- F
Papers:
- F
- F
One-Offs:
- https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-qna-rag…
-
Please note our paper on evaluation, which could be an important building block for multilingual evaluation and cultural understanding.
[SeaEval for Multilingual Foundation Models: From Cross-Lingu…
-
- [ ] [evidently/README.md at main · evidentlyai/evidently](https://github.com/evidentlyai/evidently/blob/main/README.md?plain=1)
# evidently/README.md at main · evidentlyai/evidently
## Evidently
…
-
As a user, I would like to be informed about the summarization effectiveness of my chosen LLM endpoint.
I would like to be able to evaluate an endpoint against a known, tested framework, to evaluat…
-
Thank you for your contributions!
I was wondering whether it is possible or how can I use the HHEM model to evaluate a LLM after finetuning using our specific dataset?
-
Hey guys,
Thanks for making your work public. I'm wondering if you have or will be exploring LLMs other than GPT4 for your evaluations. For instance, you've used Llama 3 in your benchmarks, would y…
-
Hi,
I have followed the command `bash scripts/adaptpruning/llama_2_7b_alpaca_gpt4.sh` with the alpaca_gpt data downloaded from `https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/da…