Closed Jaakik closed 2 years ago
Hi. The asformer is slower due to the additional slef-attn operations. However, when trained with salads, it's taking approximately 3x times longer. Whats the average video length of your dataset? Do you set the if_warp=True in '''def next_batch(self, batch_size, if_warp=False)'''?
Yes thank you.
Hello,
I am adapting your code for my own dataset which usually train relatively fast when using only ASRF, but when using your model with the transformer it's taking approximately 10x times longer. Do you have a similar behaviour with Salad/breakfast/gtea datasets ?
Thank you :)