Closed itsmag11 closed 2 years ago
Yes, using max() is optimal. The solution is right :)
Yes, using max() is optimal. The solution is right :)
Thank you for the clarification. I will close the issue then.
Sorry to bother you again. I just realized is it that the commented line below means we only use the feature of the [CLS] token since it should be the case in sequence classification problems? https://github.com/cooelf/SemBERT/blob/f849452f864b5dd47f94e2911cffc15e9f6a5a2a/pytorch_pretrained_bert/modeling.py#L1023
If so, I think using max()
may not be a correct solution. Please advise. Thank you in advance.
Using max() for pooling is more reasonable. It was used in our follow-up studies and achieved better results.
Dear author,
I encountered the same error as #25, which is from this line of code: https://github.com/cooelf/SemBERT/blob/f849452f864b5dd47f94e2911cffc15e9f6a5a2a/pytorch_pretrained_bert/modeling.py#L1026.
The solution in #25 indeed solves the error but I am not sure whether it is correct. I found that changing dim=-1 to dim=1 in https://github.com/cooelf/SemBERT/blob/f849452f864b5dd47f94e2911cffc15e9f6a5a2a/pytorch_pretrained_bert/modeling.py#L1024 also solve the issue. And based on my understanding it makes more sense to fuse the features with max() than by simply taking the first feature.
So just want to confirm with you whether it is a correct solution? Please advise. Thank you.