Closed Jummmmmp closed 6 months ago
Hi, for increasing inference speed and reduce the GPU memory usage, we've implemented Flash Attention in our inference codes. Currently, Flash Attention does not support attention masking. Furthermore, our ablation study in the paper, detailed in the attached screenshot, indicates that the exclusion of the attention mask results in only a small drop in performance.
As a result, attention masking has been disabled by default to prioritize faster inference times. If you wish to enable mask-attention, you can do so by setting efficient_attention
to false
in the configuration file located at configs/test_box.yaml
.
Hope it helps.
I try to change _testbox.yaml to _testmask.yaml and then the problem solved! I can test mask-attention now. It's so kind of you. Thanks!
I set
return_att_masks = True
in python inference.py and use the following code to run your model:python inference.py --num_images 8 --output OUTPUT/ --input_json demos/demo_corgi_kitchen.json --ckpt pretrained/instancediffusion_sd15.pth --test_config configs/test_box.yaml --guidance_scale 7.5 --alpha 0.8 --seed 0 --mis 0.36 --cascade_strength 0.4
. In contrast, I deleted all the "mask" value in _demo_corgikitchen.json and tried again, but the result still remain the same. So I wonder does Mask-Attention really work? Or is there any proper way to use Mask-Attention? Wish your early reply, thanks!