facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.3k stars 6.39k forks source link

Simultaneous (MMA/waitk) inference bug in p_choose #3892

Open George0828Zhang opened 3 years ago

George0828Zhang commented 3 years ago

🐛 Bug

In the function p_choose, another function is called with the self as first argument:

def p_choose(self, query, key, key_padding_mask, incremental_states=None):
        return self.p_choose_from_qk(self, query, key, key_padding_mask)

This is incorrect as self will be recognized as query, resulting in the error shown below (MonotonicAttention object has no attribute 'size'). This should be replaced by

def p_choose(self, query, key, key_padding_mask, incremental_states=None):
        return self.p_choose_from_qk(query, key, key_padding_mask)

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run SimulEval with the provided agent with any simultaneous models (I tried waitk & MMA-hard)
  2. See error
Traceback (most recent call last):
  File "/home/george/envs/apex/bin/simuleval", line 33, in <module>
    sys.exit(load_entry_point('simuleval', 'console_scripts', 'simuleval')())
  File "/home/george/utility/SimulEval/simuleval/cli.py", line 165, in main
    _main(args.client_only)
  File "/home/george/utility/SimulEval/simuleval/cli.py", line 192, in _main
    evaluate(args, client, server_process)
  File "/home/george/utility/SimulEval/simuleval/cli.py", line 145, in evaluate
    decode(args, client, result_queue, indices)
  File "/home/george/utility/SimulEval/simuleval/cli.py", line 108, in decode
    action = agent.policy(states)
  File "/home/george/simulst/eval/agents/fairseq_simul_st_agent.py", line 455, in policy
    x, outputs = self.model.decoder.forward(
  File "/home/george/utility/fairseq/fairseq/models/transformer/transformer_decoder.py", line 216, in forward
    x, extra = self.extract_features(
  File "/home/george/utility/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py", line 215, in extract_features
    x, attn, _ = layer(
  File "/home/george/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/george/utility/fairseq/examples/simultaneous_translation/modules/monotonic_transformer_layer.py", line 144, in forward
    x, attn = self.encoder_attn(
  File "/home/george/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/george/utility/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py", line 365, in forward
    ) = self.monotonic_attention_process_infer(
  File "/home/george/utility/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py", line 172, in monotonic_attention_process_infer
    p_choose = self.p_choose(
  File "/home/george/utility/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py", line 152, in p_choose
    return self.p_choose_from_qk(self, query, key, key_padding_mask)
  File "/home/george/utility/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py", line 135, in p_choose_from_qk
    monotonic_energy = self.energy_from_qk(
  File "/home/george/utility/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py", line 108, in energy_from_qk
    length, bsz, _ = query.size()
  File "/home/george/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'MonotonicAttention' object has no attribute 'size'

Code sample

Expected behavior

Decoder forward in inference should be error-free and not produce attribute errors.

Environment

Additional context

kurtisxx commented 3 years ago

@George0828Zhang thanks for pointing out this. I was having the same issue and trying to figure out what the problem is. Two quick questions: 1. When you change that line as shown above, were you able to evaluate MMA-hard model? Even after changing the code as you suggested, I am still getting error. Now, I am getting "RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 'batch1' (while checking arguments for bmm)". Did you also get it? 2. You mentioned that you tested MMA-hard with the provided agent. As far as I see there is no agent for MMA-hard model in the repository, am I wrong?

George0828Zhang commented 3 years ago

@George0828Zhang thanks for pointing out this. I was having the same issue and trying to figure out what the problem is. Two quick questions: 1. When you change that line as shown above, were you able to evaluate MMA-hard model? Even after changing the code as you suggested, I am still getting error. Now, I am getting "RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 'batch1' (while checking arguments for bmm)". Did you also get it? 2. You mentioned that you tested MMA-hard with the provided agent. As far as I see there is no agent for MMA-hard model in the repository, am I wrong?

  1. Yes, I also get this additional error, but I added the following line (not 100% sure, need confirmation from authors @xutaima )
    beta = beta.view(bsz * self.num_heads, tgt_len, src_len)
  2. AFAIK, all simultaneous models can use the same agent (s2t and t2t), since it reads the decoder output to execute the policy.

I understand they are migrating the simultaneous code due to api change, so there might be more bugs coming up.

EDIT: the error occured at this line. this fix can be added before the torch.bmm line.

kurtisxx commented 3 years ago

@George0828Zhang Thanks for your reply! I can also confirm that adding beta = beta.view(bsz * self.num_heads, tgt_len, src_len) solved the aforementioned error. One more question: How did you handle the predict function for MMA model (MMA-hard)? Right now, I'm using the predict function from en-ja example. One thing I did was adding if len(states.target) > self.max_len: return self.dict['tgt'].eos_word line into the predict function to make sure decoding will stop at some point. Other than this, I'm using exactly the same predict function. I think MMA-hard shouldn't require any specific changes? What dou you think?

George0828Zhang commented 3 years ago

@George0828Zhang Thanks for your reply! I can also confirm that adding beta = beta.view(bsz * self.num_heads, tgt_len, src_len) solved the aforementioned error. One more question: How did you handle the predict function for MMA model (MMA-hard)? Right now, I'm using the predict function from en-ja example. One thing I did was adding if len(states.target) > self.max_len: return self.dict['tgt'].eos_word line into the predict function to make sure decoding will stop at some point. Other than this, I'm using exactly the same predict function. I think MMA-hard shouldn't require any specific changes? What dou you think?

Yeah I don't think more changes are needed aside from the max len one you did 👍

kurtisxx commented 3 years ago

@George0828Zhang Happy to hear that you also think in the same way! Last question: Which file did you provide as --source and --target files to simuleval command? I'm trying to replicate de-en experiments reported in the paper and it says that:

For each dataset, we apply tokenization with the Moses (Koehn et al., 2007) tokenizer and preserve casing. We apply byte pair encoding (BPE) (Sennrich et al., 2016) jointly on the source and target to construct a shared vocabulary with 32K symbols

As a result, I tokenized the raw text using moses and learn and apply BPE after tokenization. So, 3 options for using as source and target files: Option-1: raw txt files, Option-2: tokenized txt files (output you get after applying moses scripts but before bpe scripts), Option-3: tokenized and BPE applied files. I think the correct one is tokenized txt files. But then the question is how am I going to apply ppe to sentences inside the agent function and also how am I going to de-bpe and de-tokenize the translation inside the agent function? It would be really useful if you also explain how you handled this part.

Edit: If tokenized but NOT BPE applied files are given as --source and --target, I think source file can be BPEed in segment_to_units ? Similar to what has been done in En-Ja file? Instead of using sentence-encoder, there must be a way of using BPE. But I couldn't figure out how to detokenize after translation completes.

George0828Zhang commented 3 years ago

@George0828Zhang Happy to hear that you also think in the same way! Last question: Which file did you provide as --source and --target files to simuleval command? I'm trying to replicate de-en experiments reported in the paper and it says that:

For each dataset, we apply tokenization with the Moses (Koehn et al., 2007) tokenizer and preserve casing. We apply byte pair encoding (BPE) (Sennrich et al., 2016) jointly on the source and target to construct a shared vocabulary with 32K symbols

As a result, I tokenized the raw text using moses and learn and apply BPE after tokenization. So, 3 options for using as source and target files: Option-1: raw txt files, Option-2: tokenized txt files (output you get after applying moses scripts but before bpe scripts), Option-3: tokenized and BPE applied files. I think the correct one is tokenized txt files. But then the question is how am I going to apply ppe to sentences inside the agent function and also how am I going to de-bpe and de-tokenize the translation inside the agent function? It would be really useful if you also explain how you handled this part.

Edit: If tokenized but NOT BPE applied files are given as --source and --target, I think source file can be BPEed in segment_to_units ? Similar to what has been done in En-Ja file? Instead of using sentence-encoder, there must be a way of using BPE. But I couldn't figure out how to detokenize after translation completes.

Kinda off topic but you should input detokenized files (those before applying BPE) to the simuleval. If you want to detokenize, you should implement it in units_to_segment on your own.

George0828Zhang commented 3 years ago

Just found out that you also need to input the incremental_state in order for the policies to work properly during inference. Update on the fix below:

def p_choose(self, query, key, key_padding_mask, incremental_state=None):
        return self.p_choose_from_qk(query, key, key_padding_mask, incremental_state=incremental_state)

Already updated in the linked PR.

kurtisxx commented 3 years ago

Hey @George0828Zhang , thanks for letting me know update on the PR!

If you want to detokenize, you should implement it in units_to_segment on your own.

I've also realized that I need to implement units_to_segment function. For the target side, there are two functions that needs to be implemented, I guess. These are units_to_segment and update_states_write. I implemented units_to_segment as follows:

    def units_to_segment(self, unit_queue, states):
        if len(unit_queue) == 0:
            return None
        if unit_queue[-1][-2:] =='@@':
            return None
        unit = "".join([item.replace("@@", "") for item in unit_queue])
        states.unit_queue.target.clear()
        return unit

However, I am not sure if this is correct. What do you think?

Also, I don't how update_states_write should be implemented, have you gone through these?

EricLina commented 2 years ago

Could you please put out the agent code ? Thank you very very much