I tried to train CoAtNet_0 with tiny image net from cs231n (200 classes). Seems the model does not converge.
Could it be that the implementation is not 100% correct? For example, the positional embedding indexing part.
I went through the code and I think other components should be correct.
Except for the pos embedding indexing, I'm not good enough to comprehend it. Do you have a reference for the implementation of the positional embedding indexing part?
Hi,
I tried to train CoAtNet_0 with tiny image net from cs231n (200 classes). Seems the model does not converge.
Could it be that the implementation is not 100% correct? For example, the positional embedding indexing part. I went through the code and I think other components should be correct.
Except for the pos embedding indexing, I'm not good enough to comprehend it. Do you have a reference for the implementation of the positional embedding indexing part?