FMInference / DejaVu

268 stars 32 forks source link

Questions about Attention Sparsity Implementation #26

Closed czq693497091 closed 5 months ago

czq693497091 commented 5 months ago

I really enjoy your work in sparsity and successfully run the deja vu. But still I have some questions.

In hf_opt_sparse_mlp_attention.py https://github.com/FMInference/DejaVu/blob/master/Decentralized_FM_alpha/modules/hf_opt_sparse_mlp_attention.py, function prepare_head_mask. In the paper, deja vu seems only select the activate head for qkv computation. But in prepare_head_mask, it seems that all the qkv are computed and use the mask to simulate the sparsity computation result, which mismatch the discription in the paper. How to understand this part? Thanks!