Adamliu1 / SNLP_GCW

3 stars 0 forks source link

[UNLEARNING+EVALS] Run continuous unlearning for a longer time to see what's gonna happen compared to batch & seq. #75

Open TheRootOf3 opened 4 months ago

TheRootOf3 commented 4 months ago

So far, we've run continuous unlearning for up to 1200 steps (eq. 2400 samples), while sequential and batch ran for longer, leading to more optimiser steps. A corresponding ablation study should be performed, where continuous unlearning runs for more optimiser steps.

Now, the following are worth noting:

Willmish commented 4 months ago

The argument we are trying to make, based on our prior results, if we match number of steps in continuous unlearning to sequential for 1024 sample size: (5120 steps): "For sequential unlearning, with the same number of unlearning steps (5120) as continuous unlearning, and ~10 times less data (1024 for sequential, 5120*2=10240 for continuous) we can achieve the same/better/(hopefully not worse) results in unlearning (based on eval benchmarks)."

Willmish commented 4 months ago

What we should be doing here however (AND WHEN VISUALISING IN #82 ), is logigng vs AMOUNT OF DATA PROCESSED (unlearned samples count - ltierally how many samples of text went through the alogrithm) cause this will be more aligned

Willmish commented 4 months ago

how about logical batch axes? (how many actual gradient ascent steps are taken) - this shows which method is quicker, but heavily unaligned X axis! (batch is 20 steps, sequentials will be different across splits, but the same across number of splits,