Different output size in scenario GG

RAIVNLab / supsup

Code for "Supermasks in Superposition"

117 stars 19 forks source link

Different output size in scenario GG #19

Closed pzSuen closed 2 years ago

pzSuen commented 2 years ago

Hello, In the paper, you said, "In practice, the tasks do not all need to be L-way — output layers can be padded until all have the same size". And I also found "--output-size" and "--real-neurons" configs in args.py, but it seems these two configs didn't be used. So I need your advice about modifying the code to train and test on different class classification tasks.

Thank you very much!

mitchellnw commented 2 years ago

As long as --output-size is larger than or equal to the number of labels for all your tasks things should be okay

pzSuen commented 2 years ago

Thank you! So what's the difference between --output-size and --real-neurons?

mitchellnw commented 2 years ago

I think --real-neurons is not used, though perhaps @vkramanuj knows better?

pzSuen commented 2 years ago

Hello, I have found the use of --real-neurons in function se_oneshot_g_minimization and se_binary_g_minimization of file adaptors.py. But not be used in gt adaptor for GG scenario. And I have tested the performance of GG(rn18-supsup.yaml) with the different class numbers and the accuracy is very low when the class number is small than output_size.

# output_size = 25, dataset is cifar100
class_numbers = [15, 5, 15, 10, 5, 15, 25, 15]
test_results=[0.0240, 0.0820, 0.0440, 0.0960, 0.0380, 0.0407, 0.0456, 0.8090]

So do you have any advice for me to adapt supsup to different class number GG scenarios?

By the way, what's the meaning of config argument --data-to-repeat?

Thank you very much!

mitchellnw commented 2 years ago

If you are in the GG scenario then no need to specify --real-neurons of --data-to-repeat.

To clarify your observing that accuracy is low when --output-size is bigger than the number of classes, but not when --output-size is equal to the number of classes?

pzSuen commented 2 years ago

Yes, the accuracy is very low when --output-size is bigger than the actual classes number, but not when equal.

So I think there should be one way to set task-specific --real-neurons while the output size is equal for the GG scenario. Just like you said in the paper, "In practice, the tasks do not all need to be L-way — output layers can be padded until all have the same size".

I had never seen this kind of padding operation, so I sincerely need your help to tell me how to achieve this. Thank you!

vkramanuj commented 2 years ago

In GG experiments, output_size is controlled by the --output-size flag (see models/resnet.py or models/gemresnet.py files), as you found. In the paper, output_size > num_classes is only relevant in cases where task ID is not provided, as this improves mask finding. Notice that in our GG experiment configs (experiments/GG/splitcifar100/configs/rn18-supsup.yaml), output_size = num_classes.