It is very sharp of you to discover the potential of sparse attention mechanism and purpose the DAB in your paper~ I'd like to know have you try to put the 'DAB' and 'SAB' in parallel when constructing the arch of the model instead of putting them in sequence?
It is very sharp of you to discover the potential of sparse attention mechanism and purpose the DAB in your paper~ I'd like to know have you try to put the 'DAB' and 'SAB' in parallel when constructing the arch of the model instead of putting them in sequence?