Open zetaodu opened 1 year ago
I find thop will not calculate the parameters in BertEmbedding and if I define two self_attention blocks in one layer, it will only calculate one.
Second self_attention block should also be used in forward method
I find thop will not calculate the parameters in BertEmbedding and if I define two self_attention blocks in one layer, it will only calculate one.