Closed guangzlu closed 1 year ago
When the api is used for inference, the dropout is always 0. But there is no significant difference between qloop and kloop forward performance with dropout=0.
When the api is used for inference, the dropout is always 0. But there is no significant difference between qloop and kloop forward performance with dropout=0.
Yes, that's because in fwd, when dropout is 0, we have a switch that enable us not go into dropout function.
In this PR, both Qloop and Kloop are enabled. You can choose the branch by using environment parameter USE_QLOOP. If USE_QLOOP=1, then qloop is used. if USE_QLOOP=0, then kloop is used. In the setup.py file, we use USE_QLOOP=1 by default. Here is a table of performance comparision between Qloop and Kloop.
kloop.vs.qloop.xlsx
(In this table, RTZ is used and we choosed function ' flash_attn_unpadded_func ' for test) From the table, we can find that when comparing total performance (fwd + bwd), qloop is better in most cases. But when comparing fwd only, kloop is better. So we recommend that use kloop for inference, and use qloop for training.