callummcdougall / ARENA_2.0

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
190 stars 78 forks source link

Fix test_calc_clipped_surrogate_objective: logprobs should all be negative #17

Closed Mihonarium closed 1 year ago

Mihonarium commented 1 year ago

mb_logprobs provided for the loss function should only contain nonpositive numbers