Open zongshenmu opened 3 years ago
Which task are you running on? Also please share your training command.
I pull your code from repository and only change the train batch size in the vilbert_tasks.yml
. I run the Multi-task Training as you mentioned in the README.md
but fail in the NLVR dataset.
I debug the dataset and find it is abnormal condition of the evaluation. It is always failed in the last iteration in the evaluation process of the model. When the batch data is fed in the model, the batch size between target data and the vil_binary_prediction data is not matched, which causes the task_losse, binary_cross_entropy_with_logits, cannot calculate right.
vil_binary_prediction torch.Size([12, 2]) target torch.Size([6, 2])
Traceback (most recent call last):
File "train_tasks.py", line 682, in <module>
main()
File "train_tasks.py", line 605, in main
tbLogger,
File "train_tasks.py", line 665, in evaluate
args, task_cfg, device, task_id, batch, model, task_losses
File "/data1/mzs/Code/vilbert-multi-task/vilbert/task_utils.py", line 157, in ForwardModelsVal
loss = task_losses[task_id](vil_binary_prediction, target)
File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 601, in forward
reduction=self.reduction)
File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/functional.py", line 2124, in binary_cross_entropy_with_logits
raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([6, 2])) must be the same as input size (torch.Size([12, 2]))
I also find it is possibly not fit for 1688-1691 lines of the code in the vilbert.py
. When the batch size is odd, it does not run.
if pooled_output.size(0) % 2 == 0:
vil_binary_prediction = self.vil_binary_prediction(
pooled_output.view(-1, pooled_output.size(1) * 2)
)
Your code in eval_tasks.py
for retrieval_datasets.py
is also wrong. Dataloader cannot analyze the right batch size and num options:
elif task_cfg[task_id]["process"] in ["retrieval"]:
max_num_bbox = features.size(1)
num_options = question.size(1)
features = features.view(-1, features.size(2), features.size(3))
spatials = spatials.view(-1, spatials.size(2), spatials.size(3))
image_mask = image_mask.view(-1, image_mask.size(2))
question = question.view(-1, question.size(2))
input_mask = input_mask.view(-1, input_mask.size(2))
segment_ids = segment_ids.view(-1, segment_ids.size(2))
co_attention_mask = co_attention_mask.view(
-1, co_attention_mask.size(2), co_attention_mask.size(3)
)
Can you provide your retrieval evaluation metric?
Code quality of this repo is surprisingly low. I couldn't believe this is an engineering of facebook. It would take much less effort to just rewrite than debug.
Dear all,
I met the same problem when I tried to run multi-task learning with the command in README.md.
The training had already continued for hours and I can even see a validation on refcocog was done in iter 513. But the code returned the same ValueError at iter 661 at last.
Do you find any solution to that? Or are there any ideas to solve this problem?
Thank you!
it is three weeks that I'm trying to run this code, it's unbelievable how full of bug it is, I'm just wondering if the results they reported are actually true.
@zongshenmu @vedanuj @chen398936790 @ZhiyuanChen
I wrote a step-by-step tutorial on how to set up the environment, train and test this model. I also added a section on extracting the visiolinguistic embeddings from the image-text data. https://naserian-elahe.medium.com/vilbert-a-model-for-learning-joint-representations-of-image-content-and-natural-language-47f56a313a79 I very much appreciate any comments or suggestions.
why it happens that the input and target size of cross entropy loss are not matched?