adamkarvonen / SAEBench

7 stars 10 forks source link

Implement MDL eval #6

Closed koayon closed 3 weeks ago

adamkarvonen commented 1 month ago

I think this looks pretty reasonable. This:

total_entropy_F = bool_entropy_F.cuda() + bool_prob_F.cuda() * float_entropy

should be this, right?

total_entropy_F = bool_entropy_F.cuda() + bool_prob_F.cuda() * float_entropy_F

A runtime estimate + memory usage per SAE would be nice, and maybe we want to have an option to aggregate the MDL over multiple batches of activations?