Closed stevie1023 closed 1 year ago
def stop_response(res):
stops = ['\n\nHuman:', '\n\nAssistant:', '\n\nhuman:', '\n\nassistant:']
for stop in stops:
if res.find(stop) >= 0:
res = res[:res.find(stop)].strip()
return res
def calculate_with_stop(file):
with open(file, 'r') as f:
df = json.load(f)
q = [x[0] for x in df]
r = [stop_response(x[1]) for x in df]
scores = [x.item() for x in reward_fn(q, r)]
print(sum(scores)/len(scores))
return scores
Sorry for my late reply and thanks for the information! One question remained is that the reward_fn() in 'https://github.com/GanjinZero/RRHF/blob/main/data_generation/scoring_responses.py' takes one argument (sample) as input, but here in the script it takes (q,r), so one sample consists of query and response, or did I misunderstand anything? (sorry I'm really new in LLM and need to learn everything from scratch:)
Sorry, my code version is a little bit chaotic. You can simply combine each q and r and use reward_fn with reward_fn(q + r) to obtain score.
Thanks for your reply, and my question has been resolved.
Hi, could you please provide the evaluation script using the reward model illustrated in your paper? Many thanks~