KhalilMrini / LAL-Parser

Neural Adobe-UCSD Parser, the current State of the Art in Constituency and Dependency Parsing.
138 stars 24 forks source link

Inference on input batches #16

Closed Sandipan99 closed 4 years ago

Sandipan99 commented 4 years ago

Hi,

I used parse.sh with your pre-trained model to make inferences. I noticed that if you pass a single sentence through example_sentences.txt, it works fine. However, if you send multiple sentences in one batch it throws an error unless the sentences are of equal length.

Traceback (most recent call last): File "src_joint/main.py", line 794, in main() File "src_joint/main.py", line 790, in main args.callback(args) File "src_joint/main.py", line 717, in runparse syntree, = parser.parse_batch(tagged_sentences) File "/Users/cssh/LAL-Parser/src_joint/KM_parser.py", line 1803, in parse_batch annotations, self.current_attns = self.encoder(emb_idxs, pre_words_idxs, batch_idxs, extra_content_annotations=extra_content_annotations) File "/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/Users/cssh/LAL-Parser/src_joint/KM_parser.py", line 1191, in forward res, current_attns = attn(res, batch_idxs) File "/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, **kwargs) File "/Users/cssh/LAL-Parser/src_joint/KM_parser.py", line 425, in forward return self.layer_norm(outputs + residual), attns_padded RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 0

In this example I used two sentences of of length 12 and the other of length 7. The problem I guess is that shorter input is being padded to match the longer sentence. But the input sequences are not which is causing dimension mismatch. I think I am missing some trick to make it work. I would appreciate your help.

pietroDB commented 4 years ago

Hi, I get exactly the same error as Sandipan99. It is ok for one sentence, but not for more... Thanks in advance

KhalilMrini commented 4 years ago

Thanks @Sandipan99 and @pietroDB for highlighting this error. I used the parse.sh script to parse multiple sentences in the past -- I will investigate this mistake and get back to you.

Sorry for this inconvenience, and thank you both for your interest in our work!

KhalilMrini commented 4 years ago

Hi, I am sorry for the belatedness. I am working on resolving the problem -- for now, a quick workaround is to set --eval-batch-size 1 when parsing. This enables you to parse multiple sentences.

KhalilMrini commented 4 years ago

I tried with 2 sentences of different lengths, and I have used the parse_quick.sh script as is (with --eval-batch-size 50). It works, and I am unable to reproduce the same issue. I have python 3.8. Also I notice the line numbers on your output do not match what we have on the repository at the moment, could you try with the latest version of the code? Thank you!

FYI, here are the sentences I used:

This is the first sentence.
This is a much longer sentence, where there is a comma.
Sandipan99 commented 4 years ago

Thanks for getting back. I had not tried with parse_quick.sh. The issue was with parse.sh. The line numbers are not matching because I had tried to debug it myself and had inserted a few intermediate lines. Anyway, thanks for your help. I will definitely try out parse_quick.sh as you mentioned.

KhalilMrini commented 4 years ago

Thanks @Sandipan99 for highlighting this problem, please reopen this issue if you encounter any other problem!