Open min-xu-et opened 3 months ago
hi, have you found out where the block-attention code is?
Apologies for the delay! The code can be found here and the mask is slightly modified again here in experiment_manager.py
.
I've also now pushed up a modified version of the transformers code that accepts this strangely shaped attention mask. The way to run this is:
modeling_llama.py
with the text of replacement_modeling_llama.py
(this allows passing attention masks in this format-- the default code doesn't support 2d attention masks)examples-stride
(i.e., a number of examples per block) smaller than the n-shots-per-window
(i.e., the total number of examples). For instance, to run 500-shot ICL on banking77 with stride size 10:
python3 run_evaluation.py --datasets banking77 --model "togethercomputer/LLaMA-2-7B-32K" --n-windows 1 --n-runs 10 --output-dir $OUTPUT_DIR --n-shots-per-window 500 --examples-stride 10 --cache-dir $CACHE --subsample-test-set 250 --fp16 --random-seed 43
There are definitely more streamlined ways to implement this. I'm planning a more intuitive rewrite for the next few weeks, but I didn't want to delay this reply any longer. Thanks!
nice work and nice talk