Predicted sequence contains all 0s

Hi there,

Thank you for making this code available! I am wondering if you might have some insight into why the following example produces a predicted sequence with all 0s - this is using the pretrained BERT_wiki:

from transformers import BertForSequenceClassification, BertTokenizer
from model import NeuralWordAligner
import torch

my_device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
BERT_folder = "/path/to/BERT_wiki"
tokenizer = BertTokenizer.from_pretrained(BERT_folder,
                                            do_lower_case=True)
bert_for_sent_seq_model = BertForSequenceClassification.from_pretrained(BERT_folder,
                                                                        output_hidden_states=True)
model = NeuralWordAligner(bert_for_sent_seq_model=bert_for_sent_seq_model,
                            tokenizer=tokenizer)
bert_for_sent_seq_model = bert_for_sent_seq_model.to(my_device)
model= aligner.to(my_device)
bert_for_sent_seq_model.eval()
model.eval()
sents1 = ["The Local Government Act 1985 was an Act of Parliament in the United Kingdom.", "All of the authorities were controlled by, or came under the control of the opposition Labour Party during Thatcher's first term.", "Its proposals formed the basis of the Local Government Bill."]
sents2 = ["The main provision, section 1 stated that \"the Greater London Council; and the metropolitan county councils\" shall not exist anymore.", "The Local Government Act 1985 was an Act of Parliament in the United Kingdom.", "It came into effect on 1 April 1986."]
_, _, alignment = model(sents1, sents2, None)

The sentences in sents1 and sents2 are from wiki-manual/train.tsv. I would expect sents[0] to be aligned to sents2[1], since they have the label aligned, but here is the output at this point:

output_both: tensor([[ 0.0720, -0.2818,  0.0411,  0.0066],
        [ 0.1266,  0.1336,  0.1619,  0.0066],
        [ 0.1359,  0.1281,  0.1375,  0.0066]], device='cuda:0',
       grad_fn=<SqueezeBackward1>)
transition_matrix: tensor([[-0.0223, -0.4793, -0.4793, -0.4793],
        [-0.3270, -0.7045, -1.0150, -1.3255],
        [-0.3270, -0.3940, -0.7045, -1.0150],
        [-0.3270, -0.7045, -0.3940, -0.7045]], device='cuda:0',
       grad_fn=<ViewBackward>)
len_A: 3
extended_length_B: 3
return_sequence: [0, 0, 0]

Any help would be greatly appreciated! Thanks again.

chaojiang06 / wiki-auto

Predicted sequence contains all 0s #2