llm-evaluation Search Results

1000+ results
for llm-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

zhao-zilong/ssc-cot #3

Purpose of TriMath dataset?

There's mention of scored steps for the solutions for each of the 100 Questions, which also accompany the corresponding questions in the json files. Are their purpose only to serve as a better evalua…

ViperVille007 updated 1 month ago
1
zhsh9/SentinelGuard #1

Why integrate LLM, ML&DL and Rule-based Methods Together?

## Methodology Discussion SentinelGuard is supposed to integrate Large Language Model Services (LLMs), Machine Learning & Deep Learning (ML&DL) methods, and Rule-based filters to identify intrusion…

zhsh9 updated 4 weeks ago
1
mbzuai-oryx/VideoGPT-plus #12

In what order should I reproduce the paper?

step1 pretrain_projector_image_encoder.sh step2 pretrain_projector_video_encoder.sh step3 finetune_dual_encoder.sh step4 eval/vcgbench/inference/run_ddp_inference.sh step5 eval/vcgbench/gpt_e…

rixejzvdl649 updated 2 weeks ago
5
lm-sys/arena-hard-auto #13

[Feature] support arena-hard in opencompass

Hi, Thanks for such a robust work! We have supported ArenaHard dataset in Opencompass now, OpenCompass is an evaluation platform that can partition tasks and support different model inference backend…

bittersweet1999 updated 2 months ago
2
qianxiao1111/evaluation #12

测试任务提示词能否尝试和代码生成方向进行对齐？

SFT数据都是基于问题和表格进行代码生成的，训练出来的专业模型泛化性不太高，很难跟随简单提示词直接固定输出格式，测试时发现很多问题SFT模型能进行拒绝，但无法按格式输出yes或者no。能否考虑将提示词修改成类似于SFT数据的形式，比如拒绝测试变成模型生成代码就算接受，不生成就算拒绝之类的？

xhkxhk updated 4 weeks ago
2
microsoft/autogen #2898

[Issue]: Would be Nice to Have a Powerful WebSurfer Team Sub…

### Describe the issue ### LMSys.org Large Model Systems Organization (LMSYS Org) is an open research organization founded by students and faculty from UC Berkeley in collaboration with UCSD and C…

Josephrp updated 4 weeks ago
2
run-llama/llama_index #13797

[Question]: How can I get the source_node from the multi-doc…

### Question Validation - [ ] I have searched both the documentation and discord for an answer. ### Question I followed this [article](https://docs.llamaindex.ai/en/stable/examples/agent/multi_docu…

ina5411ina updated 1 month ago
5
Lightning-AI/torchmetrics #2598

Need higher level RAG metrics

### Problem & Motivation There is a huge wave of interest around high accuracy Q&A, such as via Retrieval Augmented Generation (RAG). RAG accuracy is largely driven by how well vector search is abl…

devinbost updated 2 days ago
2
heeju-kim2/workspace_w-intern #1

BF16 PEFT training references

### Reference code - Llama-recipes code [https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c67345d897c0eb6529eba076e8b8](https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c…

heeju-kim2 updated 2 weeks ago
2
run-llama/llama_index #13063

[Question]: Evaluating correctness of my RAG solution

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I am trying to use the built-in capabilities of llamaindex to evaluate the correctness o…

nshern updated 2 months ago
1

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for llm-evaluation

1000+ results
for llm-evaluation