gpt-evaluation Search Results

1000+ results
for gpt-evaluation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

MrChromebox/firmware #310

Any Linux distro fails to install on Lenovo Thinkpad 11e: mm…

Hello! Hopefully this is the right place for this, if not please let me know and I'll continue my search elsewhere! I've been trying for the past few months to get some form of Linux installed on m…

Minater247 updated 1 month ago
35
All-Hands-AI/OpenHands #2140

[Bug]: SWE-bench reset_swe_env.py timeout

### Is there an existing issue for the same bug? - [X] I have checked the troubleshooting document at https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting - [X] I have checked the exis…

JessChud updated 4 months ago
86
huggingface/lighteval #318

[FT] LLM-as-judge example that doesn't require OPENAI_KEY or…

## Issue encountered While setting up the framework to evaluate using LLM-as-judge, it would be helpful to test end-to-end without special permissions like setting up openai_key or HF pro subscriptio…

chuandudx updated 2 weeks ago
3
sotopia-lab/sotopia #21

[BUG]: Reward prompt log is wrong due to the use of shared i…

### Description of the bug This line stores the reward prompt from the instance member -- `evaluator.prompt` which is updated in each `__acall__`. This is a dangerous operation since the prompt is lo…

ProKil updated 1 month ago
4
uchicago-computation-workshop/Fall2024 #1

Questions for James Evans on his 10/3 talk on "Simulating Su…

Pose your questions as Issue Comments (below) for [James Evans](https://sociology.uchicago.edu/directory/James-A-Evans) regarding his 10/3 talk on *Simulating Subjects: The Promise and Peril of AI Sta…

jamesallenevans updated 3 weeks ago
127
OpenLMLab/LEval #16

Question about the leaderboard

In the leaderboard, for the GPT-4 evaluated section, why is the sum of n_wins and n_draws not equal for each row? What evaluation method is used in the leaderboard? Is it 181 questions?

cizhenshi updated 4 months ago
3
Watts-Lab/team_comm_tools #89

🤖 Map-GPT

This week, @amaatouq reached out with an interesting idea, which is that we can potentially train a pipeline for using GPT to rate tasks (and even test to see if GPT can replicate our raters' mapping …

xehu updated 2 months ago
5
reapbenefit/virtualmentorship_c4gt #1

[DMP 2024]: Virtual Mentorship using AI for Solve Ninja Ment…

### Ticket Contents ## Description Overview: This feature aims to enhance the existing Reap Benefit Solve Ninja Mentor WhatsApp chatbot on Glific by integrating a Virtual Mentorship system. The g…

gauthamraje updated 1 month ago
35
OpenDriveLab/DriveLM #112

提交结果显示错误

![image](https://github.com/user-attachments/assets/175db829-6823-4db3-8359-28f778bce061) 如图，之前提交过的一个任务，上周查看结果正常，但是现在查看显示ERROR，分数还存在，是有什么bug吗？

scl-01 updated 2 months ago
6
truera/trulens #1290

[BUG]Metric calculation and corresponding interpretation

**Bug Description** What happened? 1）I tried to use other frameworks such as ragas and trulens to calculate context_relevance for my data sets, but the two frameworks gave different results.Is it be…

ahukmr updated 3 months ago
3

上一页 1...88 89 90 91 92 93 94...100 下一页

1000+ results for gpt-evaluation

1000+ results
for gpt-evaluation