segment_ids padding why 0

Hi xujian, nice question! For segment_ids, 0 stands for query and 1 stands for doc, but after doc tokens in input_ids, it's padding ids with 0, 0 or 1 is meaningless to segment_ids for these tokens，so I think it's equal to use 0 or 1 for index in segment_ids after doc tokens, which is padding ids. I did one experiment for this: In run_squad.py, line 310, segment_ids.append(0), the output of logits in modeling.py line 458 is as followings:

tensor([[[-0.0041,  0.0068],
         [-0.0047,  0.0048],
         [-0.0057,  0.0045],
         ...,
         [-0.0048,  0.0031],
         [-0.0036,  0.0056],
         [-0.0039,  0.0028]],

        [[-0.0042,  0.0058],
         [-0.0056,  0.0084],
         [-0.0066,  0.0066],
         ...,
         [-0.0061,  0.0036],
         [-0.0064,  0.0031],
         [-0.0075,  0.0061]],

        [[-0.0040,  0.0050],
         [-0.0052,  0.0052],
         [-0.0049,  0.0027],
         ...,
         [-0.0049,  0.0010],
         [-0.0003,  0.0049],
         [-0.0062,  0.0016]]], grad_fn=<ThAddBackward>)

And if you change the code tosegment_ids.append(1), we will get the same answer:

tensor([[[-0.0041,  0.0068],
         [-0.0047,  0.0048],
         [-0.0057,  0.0045],
         ...,
         [-0.0048,  0.0031],
         [-0.0036,  0.0056],
         [-0.0039,  0.0028]],

        [[-0.0042,  0.0058],
         [-0.0056,  0.0084],
         [-0.0066,  0.0066],
         ...,
         [-0.0061,  0.0036],
         [-0.0064,  0.0031],
         [-0.0075,  0.0061]],

        [[-0.0040,  0.0050],
         [-0.0052,  0.0052],
         [-0.0049,  0.0027],
         ...,
         [-0.0049,  0.0010],
         [-0.0003,  0.0049],
         [-0.0062,  0.0016]]], grad_fn=<ThAddBackward>)

So, it's the same for segment_id to store 0 or 1, because the data it corresponds to is padding ids.

Please check and let me know if there are any questions. Thanks.

eva-n27 / BERT-for-Chinese-Question-Answering

segment_ids padding why 0 #1