Closed harisethuram closed 1 month ago
Yes, of course! To create the top_heads
list, we aggregate the indirect effects from all the abstractive datasets for which the model does better than baseline performance (see appendix G in the paper; and E.2 for an example of baseline ICL performance - we use the majority label as our baseline).
For GPT-J for example, there were 18 tasks that were used to compute the AIE of each head, based on GPT-J's ICL performance:
gptj_tasks = ['antonym', 'capitalize', 'capitalize_first_letter', 'country-capital',
'country-currency', 'english-french', 'english-german', 'english-spanish',
'landmark-country', 'lowercase_first_letter', 'national_parks', 'park-country',
'person-sport', 'present-past', 'product-company', 'sentiment', 'singular-plural', 'synonym']
Let me know if you have more questions, thanks!
I would like to construct the top_head list in compute_universal_function_vector for new models (such as llama 3). In 1 and 10, you mention the script that computes the activations for one task, and the prompt settings. However, I'm not sure over which datasets specifically you aggregate to compute the average activations. Could you clarify this? Thanks!