Dismatch batch size between query_layer and key_layer

THUDM / Inf-DiT

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Apache License 2.0

378 stars 19 forks source link

Dismatch batch size between query_layer and key_layer #20

Closed Luciennnnnnn closed 4 months ago

Luciennnnnnn commented 4 months ago

When flash_attn_func is called, I see query_layer have different batch size with key_layer and value_layer. Since cat layer operation pads key_layer and value_layer with extra one row/colum, then after view operation, key_layer and value_layer should have larger batch size than query_layer. Is this intended and flash_attn_func support this kind of usage, or is it a bug?

@yzy-thu cc.

Luciennnnnnn commented 4 months ago

Sorry, I have some misapprehension for the code, I'll close it since I have understood it.