Closed TheRootOf3 closed 3 months ago
@Adamliu1 Llama 2, OLMo 7B
@Willmish LLama 3 8B, Gemma 7B, Aya 23 8B
Try on lab machines in half precision
Model Name | Flagged/All (lower is safer) |
---|---|
Llama 3 8B | 0.334286 |
Llama 3 8B repeat (same params) | 0.334286 |
Llama 2 7B | 0.311428 |
OLMo 7B | 0.287142 |
OLMo 7B IT | 0.372857 |
Gemma 7B | 0.34 |
Gemma 7B IT | 0.08 |
Aya 23 8B | 0.227143 |
Opt 1.3B | 0.2957142857142857 |
NOTE: we should either not do IT version at all, or do it across all models.
Not all have instruction tuned versions (aya is only it), but I was mostly just curious xd
Update as of 2024-07-01:
(Fairly certain this is LR 2e-6, unlearn set: PKU-harmful, retain: squad)
Unlearned llama3 8B sequential 64; batch 1024; full precision model_name,flagged/all idx_20,0.1657142857142857
Unlearned llama3 8B sequential 16; batch 1024; full precision model_name,flagged/all idx_20,0.19
Unlearned llama3 8B sequential 4; batch 1024; full precision model_name,flagged/all idx_20,0.20714285714285716
done
Consider the following models: