it would be nice to point out where is the block attention code for experiments you mentioned in the talk

min-xu-et commented 3 months ago

nice work and nice talk

SeanAlpaca818 commented 3 months ago

hi, have you found out where the block-attention code is?

abertsch72 commented 2 months ago

Apologies for the delay! The code can be found here and the mask is slightly modified again here in experiment_manager.py.

I've also now pushed up a modified version of the transformers code that accepts this strangely shaped attention mask. The way to run this is:

Make sure you're running transformers version 4.37.2
Replace the text of the library code file modeling_llama.py with the text of replacement_modeling_llama.py (this allows passing attention masks in this format-- the default code doesn't support 2d attention masks)
In your call to the script, set an examples-stride (i.e., a number of examples per block) smaller than the n-shots-per-window (i.e., the total number of examples). For instance, to run 500-shot ICL on banking77 with stride size 10: python3 run_evaluation.py --datasets banking77 --model "togethercomputer/LLaMA-2-7B-32K" --n-windows 1 --n-runs 10 --output-dir $OUTPUT_DIR --n-shots-per-window 500 --examples-stride 10 --cache-dir $CACHE --subsample-test-set 250 --fp16 --random-seed 43

There are definitely more streamlined ways to implement this. I'm planning a more intuitive rewrite for the next few weeks, but I didn't want to delay this reply any longer. Thanks!

abertsch72 / long-context-icl

it would be nice to point out where is the block attention code for experiments you mentioned in the talk #2