Thank you for sharing!
In your STAR architecture, I found that the Spacial and Temporal Transformers are constructed parallelly in the first encoder, while serially in the second encoder. So why parallel? and why serial?
Could you please tell us theory behind it?
Thank you!
Thank you for sharing! In your STAR architecture, I found that the Spacial and Temporal Transformers are constructed parallelly in the first encoder, while serially in the second encoder. So why parallel? and why serial? Could you please tell us theory behind it? Thank you!