Hi, thanks for your great work.
I have a question regarding the transformer body. In the paper it is stated that each transformer block T1-T5 gets the mask as input, however, in the code only the blocks T1 and T2 receive the mask-> x, x_size, mask = block(x, x_size, mask), whereas the blocks T3,T4,T5 receive None -> x, x_size, mask = block(x, x_size, None). Could you please explain me why the mask is not passed to the blocks T3-T5?
Hi, thanks for your great work. I have a question regarding the transformer body. In the paper it is stated that each transformer block T1-T5 gets the mask as input, however, in the code only the blocks T1 and T2 receive the mask-> x, x_size, mask = block(x, x_size, mask), whereas the blocks T3,T4,T5 receive None -> x, x_size, mask = block(x, x_size, None). Could you please explain me why the mask is not passed to the blocks T3-T5?
Thanks :)