callummcdougall / ARENA_2.0

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
190 stars 78 forks source link

Update 02_[1.2]_Intro_to_Mech_Interp.py #26

Closed Mihonarium closed 1 week ago

Mihonarium commented 2 weeks ago

Specify layer and head_index to avoid the test failing because those weren't updated and not because the pos_by_pos_pattern is wrong.

callummcdougall commented 1 week ago

Hey - this is actually an older version of the material (I think it should all redirect to the ARENA 3.0 repo), so I'm not actively maintaining it with new PRs. But if there's anywhere it fails to link to the new material then lmk, so I can fix that!