Open YiboZhao624 opened 6 months ago
I have added this
V = self.plm(input_ids=input_val, attention_mask=attention_mask).last_hidden_state # [batch_size, seq_len] -> [batch_size, seq_len, hidden_size]
multihead_attn_output, _ = self.multihead_attention(
V, V, V
)
Dear YadaYuki,
I hope this message finds you well. I apologize for reaching out to you once more. Upon further examination, I noticed an aspect of the code in
src/recommendation/PLMBasedNewsEncoder.py
that caught my attention. It appears that the code utilizes the pretrained BERT-base-uncased model. However, upon closer inspection, it seems that the input solely consists of token IDs from the texts, without including the padding mask.The relevant snippet of code is as follows:
Upon reviewing the data processing section, particularly around line 198 in
src/mind/MINDDataset.py
, I observed the usage of thetransform
function. It appears that you may have intentionally omitted the padding mask. I am intrigued by this approach and would appreciate insight into the rationale behind this decision.Thank you for your time and consideration.
Sincerely, Yibo