The generation of Table 1 in your paper

TInaWangxue commented 11 months ago

I ran the "ff_as_attention_cifar10_10samples.yaml" by running run.py. I want to generate the table 1 in your paper by running /paper/ff_as_attention/print_predictive_power.py. But I have some questions about the table. I wonder the meaning of "Prediction accuracy (%) of the true target label (Target) and model output (Output) from argmax of the per-class attention scores". The true target label is a set of labels, like 0, 1, 2, 3, 4,...,9 in CIFAR10, so how to compute the Prediction accuracy (%) of the true target label (Target).

TInaWangxue commented 11 months ago

And Sorry to interupt you, I have run the "ff_as_attention_cifar10_10samples.yaml" by running run.py. When I want to run /paper/ff_as_attention/print_predictive_power.py to generate the table 1 in your paper, I can't find the Edit config file "paper/config.json" that you said in ReadMe. And I wonder how to edit the following code api = wandb.Api() sweep = api.sweep("username/ff_as_attention/l92zzffq") This is my first time to use wandb, so I wonder if the username is my wandb account name, and what does "ff_as_attention/l92zzffq" mean?

TInaWangxue commented 11 months ago

And I ran the experiment on a linux remote server, but when I logged into wandb locally, it didn't contain the project I ran on the linux server. So I cann't edit the config "sweep = api.sweep("username/ff_as_attention/l92zzffq")". Maybe I forget to config something?

RobertCsordas commented 11 months ago

The goal of Table 1 is to check how predictive is the class of the training examples the model attended to the most. To do so, we sum the attention score for each training example. We do this on a per-class basis (so we sum independently for each class). Then we obtain a histogram, like the ones reported in Fig 5. Let's say class with the most attention is denoted by $y'$ the true class is $y$ and the model's output is $\hat{y}$ and $y, y', \hat{y} \in \{ 1..10 \}$. If the model's prediction is correct (so $\hat{y} = y$), then we report the proportion of predictions where $y' = y$ (right column). If the model's prediction is incorrect (so $\hat{y} \ne y$) we report the proportion of predictions where the $y' = y$ (Traget on the left) and the proportion of predictions where $y' = \hat{y}$ (Output on the left).

As for running the experiments, I added the missing config and updated the readme file. Sorry about this. The username and project name are the first two parts of the URL of your W&B workspace. If you open W&B, you will see an URL in a https://wandb.ai/username/projectname/something format. You need to put the "username/projectname" part in the config. Please let me know if this works.

TInaWangxue commented 11 months ago

I met an error, though the parameter "log" is set as "wandb", I also set the default of the parameter "log" as "wandb" in training_helper, it still didn't work. The "log" is still "tb". I can't figure it. `def initialize(restore: Optional[str] = None): helper = framework.helpers.TrainingHelper(wandb_project_name="ff_as_attention_cifar10_10samples_wandb", register_args=register_args, extra_dirs=["export", "model_weights", "tmp"], log_async=False, restore=restore) task = tasks.get_task(helper.args.task)

task = task(helper)
return helper, task

def main(): helper, task = initialize() print("args.task:", helper.args.task) print("args.log:", helper.args.log)`

In training_helper.py, I also add a check line: but the value of parameter "log" is still wrong.

RobertCsordas commented 11 months ago

Oh, I see. It is because you run it with run.py which is meant for local debugging. If you remove "log" from https://github.com/robertcsordas/linear_layer_as_attention/blob/c0a20bf4a5f7a9c076b4934959910e522ee5951b/run.py#L18, or you run the command that is printed by run.py directly with adding "--log wandb" to the end, and it should work. Let me know how it goes. The run should be visible on your W&B dashboard.

TInaWangxue commented 11 months ago

Oh, I see. It is because you run it with run.py which is meant for local debugging. If you remove "log" from https://github.com/robertcsordas/linear_layer_as_attention/blob/c0a20bf4a5f7a9c076b4934959910e522ee5951b/run.py#L18, or you run the command that is printed by run.py directly with adding "--log wandb" to the end, and it should work. Let me know how it goes. The run should be visible on your W&B dashboard.

Thank you so much！

TInaWangxue commented 11 months ago

Sorry to interupt you, I have a question about the paper. In your paper, you prove that "Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience". When using this claim in single-task training for image classification using feedforward NNs with two hidden layers, the total storage of training patterns is thus roughly 3×800×T units, where T is the “number of training datapoints” counting all examples across all training mini-batches. Inspired by this conclusion, I want to use the ATTENTION SCORES SUM in more complicatied image classification, where there is classifier (also feedforward NNs). Here is the result generated by classifier and then is processed by softmax. I wonder if the SOFTMAXed result is consistent with the ATTENTION SCORES SUM result or they are different. If they are consistent, does it solve the storage overhead of ATTENTION SCORES SUM?

RobertCsordas / linear_layer_as_attention

The generation of Table 1 in your paper #3