Training fails for batch with 1 sentence only

salman1993 commented 4 years ago

If batch contains source text with only 1 sentence, training fails (specifically, loading examples after data processing). The issue seems to be that output size should have been (2, 1) - instead its (2, 2). Labels size is fine.

Steps to reproduce:

test_one_sent/train.0.json file:

[
{"src": [["Hello", "."]], "labels": [0]},
{"src": [["Hi", "."]], "labels": [0]}
]

test_one_sent/val.0.json file:

[
{"src": [["Hey", "."]], "labels": [0]},
{"src": [["Hiya", "."]], "labels": [0]}
]

Command to run:

python src/main.py --data_path ./data/test_one_sent/ --weights_save_path ./trained_models --do_train --max_steps 10

Error output:

2020-10-28 16:43:12,731|extractive|INFO> TRAIN_STEP 0
2020-10-28 16:43:12,731|extractive|INFO> batch
{'sent_lengths': [[4, 1], [5, 0]], 'sent_lengths_mask': tensor([[ True, False],
        [ True, False]]), 'input_ids': tensor([[    0, 13368,   479,     2,     0],
        [    0, 30086,  2636,   479,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1]]), 'token_type_ids': tensor([[0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1]]), 'labels': tensor([[0],
        [0]]), 'sent_rep_token_ids': tensor([[0],
        [0]]), 'sent_rep_mask': tensor([[True],
        [True]])}
2020-10-28 16:43:12,758|extractive|ERROR> Target size (torch.Size([2, 1])) must be the same as input size (torch.Size([2, 2]))
2020-10-28 16:43:12,759|extractive|ERROR> Details about above error:
1. outputs=tensor([[-0.1405, -0.1505],
        [-0.1405, -0.1505]])
labels.float()=tensor([[0.],
        [0.]])
Traceback (most recent call last):
  File "src/main.py", line 403, in <module>
    main(main_args)
  File "src/main.py", line 97, in main
    trainer.fit(model)
.......
  File "/Users/salmanmohammed/dev/TransformerSum/src/extractive.py", line 326, in compute_loss
    loss = loss * mask.float()
UnboundLocalError: local variable 'loss' referenced before assignment

salman1993 commented 4 years ago

If you edit train and val data file to include 2 sentences, then data processing and training succeeds:

[
    {"src": [["Hello", "world", "."], ["sent", "two"]], "labels": [0, 0]},
    {"src": [["Hi", "."]], "labels": [0]}
]

salman1993 commented 4 years ago

@HHousen do you know why this happens?

HHousen commented 4 years ago

@salman1993 I've figured this out. The problem was caused by the call to squeeze() on line 59 here: https://github.com/HHousen/TransformerSum/blob/3921b229c1025dad1759a8bd52a8080a9659d696/src/pooling.py#L56-L62 I've removed this function call because I do not see why it needs to be there anymore.

HHousen / TransformerSum

Training fails for batch with 1 sentence only #34