I have one question about class Word2vecDataset(Dataset).
In getitem(self, idx), is window_size correct?
return [(u, v, self.data.getNegatives(v, 5)) for i, u in enumerate(word_ids) for j, v in
enumerate(word_ids[max(i - boundary, 0):i + boundary]) if u != v]
I think this code returns wrong windows (word_ids[max(i - boundary, 0):i + boundary])) and following code (word_ids[max(i - boundary, 0):i + boundary+1])) may be correct.
return [(u, v, self.data.getNegatives(v, 5)) for i, u in enumerate(word_ids) for j, v in
enumerate(word_ids[max(i - boundary, 0):i + boundary+1]) if u != v]
If it is not wrong, I'm sorry for that.
In addition to this, it may not be important and I don't have confidence.
if u != v needs to change if i != j.
Hi, @Andras7. Thank you for your contribution.
I have one question about class Word2vecDataset(Dataset). In getitem(self, idx), is window_size correct?
I think this code returns wrong windows (word_ids[max(i - boundary, 0):i + boundary])) and following code (word_ids[max(i - boundary, 0):i + boundary+1])) may be correct.
If it is not wrong, I'm sorry for that.
In addition to this, it may not be important and I don't have confidence.
if u != v
needs to changeif i != j
.