codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.09k stars 1.29k forks source link

In Next Sentence Prediction task,the original code may choose the same line when you try to use the negative sample #86

Open Emir-Liu opened 3 years ago

Emir-Liu commented 3 years ago
    def get_random_line(self):   
        ...
        return self.lines[random.randrange(len(self.lines))][1]
        ...

it should be changed to the following:

    def get_random_line(self,index):
        ...   
        tmp = random.randrange(len(self.lines))                                                     
        while(tmp == index):                                                                         
            tmp = random.randrange(len(self.lines))
        return self.lines[random.randrange(len(self.lines))][1]
        ...