eva-n27 / BERT-for-Chinese-Question-Answering

Apache License 2.0
78 stars 18 forks source link

segment_ids padding why 0 #1

Closed hitxujian closed 5 years ago

hitxujian commented 5 years ago

hi: thanks for sharing! one problem about segment_ids array. while len(input_ids) < max_seq_length: input_ids.append(0) input_mask.append(0) segment_ids.append(0) in segment_ids array,1 indicates token from passage and 0 indicate token form query. when padding,why segment_ids filled with 0,which represents query image

eva-n27 commented 5 years ago

Hi xujian, nice question! For segment_ids, 0 stands for query and 1 stands for doc, but after doc tokens in input_ids, it's padding ids with 0, 0 or 1 is meaningless to segment_ids for these tokens,so I think it's equal to use 0 or 1 for index in segment_ids after doc tokens, which is padding ids. I did one experiment for this: In run_squad.py, line 310, segment_ids.append(0), the output of logits in modeling.py line 458 is as followings:

tensor([[[-0.0041,  0.0068],
         [-0.0047,  0.0048],
         [-0.0057,  0.0045],
         ...,
         [-0.0048,  0.0031],
         [-0.0036,  0.0056],
         [-0.0039,  0.0028]],

        [[-0.0042,  0.0058],
         [-0.0056,  0.0084],
         [-0.0066,  0.0066],
         ...,
         [-0.0061,  0.0036],
         [-0.0064,  0.0031],
         [-0.0075,  0.0061]],

        [[-0.0040,  0.0050],
         [-0.0052,  0.0052],
         [-0.0049,  0.0027],
         ...,
         [-0.0049,  0.0010],
         [-0.0003,  0.0049],
         [-0.0062,  0.0016]]], grad_fn=<ThAddBackward>)

And if you change the code tosegment_ids.append(1), we will get the same answer:

tensor([[[-0.0041,  0.0068],
         [-0.0047,  0.0048],
         [-0.0057,  0.0045],
         ...,
         [-0.0048,  0.0031],
         [-0.0036,  0.0056],
         [-0.0039,  0.0028]],

        [[-0.0042,  0.0058],
         [-0.0056,  0.0084],
         [-0.0066,  0.0066],
         ...,
         [-0.0061,  0.0036],
         [-0.0064,  0.0031],
         [-0.0075,  0.0061]],

        [[-0.0040,  0.0050],
         [-0.0052,  0.0052],
         [-0.0049,  0.0027],
         ...,
         [-0.0049,  0.0010],
         [-0.0003,  0.0049],
         [-0.0062,  0.0016]]], grad_fn=<ThAddBackward>)

So, it's the same for segment_id to store 0 or 1, because the data it corresponds to is padding ids.

Please check and let me know if there are any questions. Thanks.