Replicate the paper's experiments.

EleutherAI / radioactive-lab

Adapting the "Radioactive Data" paper to work for text models

Other

9 stars 3 forks source link

Replicate the paper's experiments. #5

Open StellaAthena opened 3 years ago

StellaAthena commented 3 years ago

To ensure that our refactored code is working properly, we need to replicate the experiments from the paper. We want to replicate the following experiments:

[ ] Results (Table 2, Figure 6)
[ ] Architecture transfer (Table 3)
[ ] Transfer to other datasets (Table 4)
[ ] Robustness to distillation (Table 5)

StellaAthena commented 3 years ago

@researcher2 and I have made significant progress on this, but we haven’t ironed out all the issues yet. Here’s Table 1 from the paper followed by our current best results E515B8C5-7D3A-4FD0-A772-32D7A7F977FA C601C773-D123-4AEE-916D-D5723FBBE510

zzzucf commented 2 years ago

Is it the result from Cifar or ImageNet with reduced classes?

researcher2 commented 2 years ago

I believe the results shown here were CIFAR10. The network marking step only produced good results for us on CIFAR10 and Imagenette when doing a finetune (linear classification layer only). We never moved onto a full imagenet run.

zzzucf commented 2 years ago

The result shows that from not radioactive to 10% of data become radioactive, the log10(p) only increase +4? is that a very small change due to the randomness comparing to results from other ratio.

zzzucf commented 2 years ago

Do you also have the cosine similarity result (average) according to the test?

researcher2 commented 2 years ago

CIFAR10 Table 1 This ended up working. table1

I couldn't find table 2 lying around, you would have to re-run. It didn't work though for CIFAR10 or Imagenette, the log10(p) was basically random with respect to percentage marked.

We don't have a log of cosine similarity, please feel free to play around - the main thing I notice is we have weird marking percentages of 0.1, 0.2 etc in the committed code, must have been playing around at the time.