cooelf / SemBERT

Semantics-aware BERT for Language Understanding (AAAI 2020)
https://arxiv.org/abs/1909.02209
MIT License
286 stars 55 forks source link

Pool function dimension not match #26

Closed itsmag11 closed 2 years ago

itsmag11 commented 2 years ago

Dear author,

I encountered the same error as #25, which is from this line of code: https://github.com/cooelf/SemBERT/blob/f849452f864b5dd47f94e2911cffc15e9f6a5a2a/pytorch_pretrained_bert/modeling.py#L1026.

The solution in #25 indeed solves the error but I am not sure whether it is correct. I found that changing dim=-1 to dim=1 in https://github.com/cooelf/SemBERT/blob/f849452f864b5dd47f94e2911cffc15e9f6a5a2a/pytorch_pretrained_bert/modeling.py#L1024 also solve the issue. And based on my understanding it makes more sense to fuse the features with max() than by simply taking the first feature.

So just want to confirm with you whether it is a correct solution? Please advise. Thank you.

cooelf commented 2 years ago

Yes, using max() is optimal. The solution is right :)

itsmag11 commented 2 years ago

Yes, using max() is optimal. The solution is right :)

Thank you for the clarification. I will close the issue then.

itsmag11 commented 2 years ago

Sorry to bother you again. I just realized is it that the commented line below means we only use the feature of the [CLS] token since it should be the case in sequence classification problems? https://github.com/cooelf/SemBERT/blob/f849452f864b5dd47f94e2911cffc15e9f6a5a2a/pytorch_pretrained_bert/modeling.py#L1023

If so, I think using max() may not be a correct solution. Please advise. Thank you in advance.

cooelf commented 2 years ago

Using max() for pooling is more reasonable. It was used in our follow-up studies and achieved better results.