evaluate-llm Search Results

1000+ results
for evaluate-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

haotian-liu/LLaVA #423

[Question] How can I do Multi-turn conversation evaluation?

### Question I notice that eval code for ScienceQA only support single turn QA, but I want to evaluate on Multi-turn conversation task. How can I get multi-turn response in evaluation stage?

Remwlp updated 8 months ago
2
NCATSTranslator/Feedback #796

Prioritize use of LLMs to improve ordering of pubs, snippet …

Filing this ticket in support of the LLM-based 'publication filtering' tool presented by @Rosinaweber at the June 2024 Relay. In the example below, four pubs are shown as providing direct support …

mbrush updated 4 months ago
3
AkihikoWatanabe/paper_notes #1028

A Survey on Large Language Model based Autonomous Agents, Le…

# URL - https://arxiv.org/abs/2308.11432 # Affiliations - Lei Wang, N/A - Chen Ma, N/A - Xueyang Feng, N/A - Zeyu Zhang, N/A - Hao Yang, N/A - Jingsen Zhang, N/A - Zhiyuan Chen, N/A - …

AkihikoWatanabe updated 7 months ago
2
manisnesan/fastchai #76

Meeting Summarization Use Case

[From rasbt post](https://x.com/rasbt/status/1754516687896887449?s=46&t=aOEVGBVv9ICQLUYL4fQHlQ) - Flan T5 is a great go to model for text classification. Tiny titans - Can smaller LL…

manisnesan updated 2 days ago
12
shure-dev/Awesome-LLM-Papers-Comprehensive-Topics #2

Papers which will be added in the future

This issue is for the notification of papers which will be added to this repo in the future

shure-dev updated 7 months ago
1
unslothai/unsloth #771

Model inference - performace drop when using unsloth

Hi, I fine-tuned a model (yam-peleg/Experiment26-7B) using unsloth. Then during inference, model correctness drops when using unsloath FastLanguageModel. I see some modules are replaced. It looks a li…

TomekPro updated 3 months ago
4
acharkq/MolCA #14

Lora weight issue when conducting fine-tune stage

Hi! I am trying to reimplement the fine-tune stage of MolCA and running the code: `python stage2.py --root 'data/PubChem324kV2/' --devices '0,1' --filename "ft_pubchem324k" --stage2_path "all_checkp…

DingYX0731 updated 3 months ago
2
Teddy-XiongGZ/MedRAG #23

about paper issues

Hello, I was reading this aper recently and have questions about the paper. May I ask if medrag has selected the corpus + search+LLM method to evaluate mirrag’s data set?

chenboju updated 1 week ago
2
chen700564/RGB #3

rejection rate of chatGPT

in the article it says that gpt-3.5-turbo is used to measure the rejection rate. what explains this difference in results for chatGPT given that it is used as a reference? ![image](https://github.co…

valdesguefa updated 1 year ago
8
EleutherAI/lm-evaluation-harness #1575

When using `parallelize=True`, raise Runtime Error: expected…

I think there are a few issues being conflated here and it would be helpful to disentangle them: We support: - launching with `accelerate launch`, which is only meant to support Data-parallel …

feiba54 updated 6 months ago
13

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for evaluate-llm

1000+ results
for evaluate-llm