evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

confident-ai/deepeval #872

Error while using the most popular opensource chat models

**Describe the bug** I encounter 'ValueError: Evaluation LLM outputted an invalid JSON. Please use a better evaluation model.' while using most popular open source chat models in DeepEval framework. …

adkakne updated 1 month ago
2
IS2Lab/S-Eval #3

How to do the automatic safety evaluations

Hello! This is very nice work! I kindly want to know how to do the automatic safety evaluations. According to the paper, you use a safety critique llm for the evaluations. Will you release the…

TianuaXu updated 2 months ago
1
InternLM/xtuner #902

训练营3 XTuner运行xtuner train ./internlm2_chat_1_8b_qlora_alpaca…

按照教程走没有发现不一样地方，运行训练代码报错、； `(langgpt) root@intern-studio-50152060:~/InternLM/XTuner# xtuner train ./internlm2_chat_1_8b_qlora_alpaca_e3_copy.py /root/.conda/envs/langgpt/lib/python3.10/site-packages/…

Viki-researcher updated 4 days ago
2
rmusser01/tldw #195

Evaluation: RAG implementation

Issue is to track evaluation of RAG implementations. Frameworks: - F Papers: - F - F One-Offs: - https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-qna-rag…

rmusser01 updated 5 days ago
1
tjunlp-lab/Awesome-LLMs-Evaluation-Papers #7

SeaEval: Multilingual LLM Evaluation

Please note our paper on evaluation, which could be an important building block for multilingual evaluation and cultural understanding. [SeaEval for Multilingual Foundation Models: From Cross-Lingu…

BinWang28 updated 9 months ago
7
irthomasthomas/undecidability #877

evidently - An open-source framework to evaluate, test and m…

- [ ] [evidently/README.md at main · evidentlyai/evidently](https://github.com/evidentlyai/evidently/blob/main/README.md?plain=1) # evidently/README.md at main · evidentlyai/evidently ## Evidently …

ShellLM updated 3 weeks ago
1
rmusser01/tldw #29

Improvement: Add eval test for summarization across differen…

As a user, I would like to be informed about the summarization effectiveness of my chosen LLM endpoint. I would like to be able to evaluate an endpoint against a known, tested framework, to evaluat…

rmusser01 updated 1 month ago
3
vectara/hallucination-leaderboard #50

How can I use the HHEM model to evaluate my LLM after finetu…

Thank you for your contributions! I was wondering whether it is possible or how can I use the HHEM model to evaluate a LLM after finetuning using our specific dataset?

zjq0455 updated 1 month ago
1
MedicineToken/Medical-Graph-RAG #5

Adding support for Ollama/local LLMs

Hey guys, Thanks for making your work public. I'm wondering if you have or will be exploring LLMs other than GPT4 for your evaluations. For instance, you've used Llama 3 in your benchmarks, would y…

NumberChiffre updated 3 days ago
2
ROIM1998/APT #2

How to get the results shown in Table 3?

Hi, I have followed the command `bash scripts/adaptpruning/llama_2_7b_alpaca_gpt4.sh` with the alpaca_gpt data downloaded from `https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/da…

au-revoir updated 3 weeks ago
2

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm