Open George0828Zhang opened 3 years ago
@George0828Zhang thanks for pointing out this. I was having the same issue and trying to figure out what the problem is. Two quick questions: 1. When you change that line as shown above, were you able to evaluate MMA-hard model? Even after changing the code as you suggested, I am still getting error. Now, I am getting "RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 'batch1' (while checking arguments for bmm)". Did you also get it? 2. You mentioned that you tested MMA-hard with the provided agent. As far as I see there is no agent for MMA-hard model in the repository, am I wrong?
@George0828Zhang thanks for pointing out this. I was having the same issue and trying to figure out what the problem is. Two quick questions: 1. When you change that line as shown above, were you able to evaluate MMA-hard model? Even after changing the code as you suggested, I am still getting error. Now, I am getting "RuntimeError: Expected 3-dimensional tensor, but got 2-dimensional tensor for argument #1 'batch1' (while checking arguments for bmm)". Did you also get it? 2. You mentioned that you tested MMA-hard with the provided agent. As far as I see there is no agent for MMA-hard model in the repository, am I wrong?
beta = beta.view(bsz * self.num_heads, tgt_len, src_len)
I understand they are migrating the simultaneous code due to api change, so there might be more bugs coming up.
EDIT: the error occured at this line. this fix can be added before the torch.bmm
line.
@George0828Zhang Thanks for your reply! I can also confirm that adding beta = beta.view(bsz * self.num_heads, tgt_len, src_len)
solved the aforementioned error. One more question: How did you handle the predict function for MMA model (MMA-hard)? Right now, I'm using the predict function from en-ja example. One thing I did was adding if len(states.target) > self.max_len: return self.dict['tgt'].eos_word
line into the predict function to make sure decoding will stop at some point. Other than this, I'm using exactly the same predict function. I think MMA-hard shouldn't require any specific changes? What dou you think?
@George0828Zhang Thanks for your reply! I can also confirm that adding
beta = beta.view(bsz * self.num_heads, tgt_len, src_len)
solved the aforementioned error. One more question: How did you handle the predict function for MMA model (MMA-hard)? Right now, I'm using the predict function from en-ja example. One thing I did was addingif len(states.target) > self.max_len: return self.dict['tgt'].eos_word
line into the predict function to make sure decoding will stop at some point. Other than this, I'm using exactly the same predict function. I think MMA-hard shouldn't require any specific changes? What dou you think?
Yeah I don't think more changes are needed aside from the max len one you did 👍
@George0828Zhang Happy to hear that you also think in the same way! Last question: Which file did you provide as --source and --target files to simuleval command? I'm trying to replicate de-en experiments reported in the paper and it says that:
For each dataset, we apply tokenization with the Moses (Koehn et al., 2007) tokenizer and preserve casing. We apply byte pair encoding (BPE) (Sennrich et al., 2016) jointly on the source and target to construct a shared vocabulary with 32K symbols
As a result, I tokenized the raw text using moses and learn and apply BPE after tokenization. So, 3 options for using as source and target files: Option-1: raw txt files, Option-2: tokenized txt files (output you get after applying moses scripts but before bpe scripts), Option-3: tokenized and BPE applied files. I think the correct one is tokenized txt files. But then the question is how am I going to apply ppe to sentences inside the agent function and also how am I going to de-bpe and de-tokenize the translation inside the agent function? It would be really useful if you also explain how you handled this part.
Edit: If tokenized but NOT BPE applied files are given as --source and --target, I think source file can be BPEed in segment_to_units ? Similar to what has been done in En-Ja file? Instead of using sentence-encoder, there must be a way of using BPE. But I couldn't figure out how to detokenize after translation completes.
@George0828Zhang Happy to hear that you also think in the same way! Last question: Which file did you provide as --source and --target files to simuleval command? I'm trying to replicate de-en experiments reported in the paper and it says that:
For each dataset, we apply tokenization with the Moses (Koehn et al., 2007) tokenizer and preserve casing. We apply byte pair encoding (BPE) (Sennrich et al., 2016) jointly on the source and target to construct a shared vocabulary with 32K symbols
As a result, I tokenized the raw text using moses and learn and apply BPE after tokenization. So, 3 options for using as source and target files: Option-1: raw txt files, Option-2: tokenized txt files (output you get after applying moses scripts but before bpe scripts), Option-3: tokenized and BPE applied files. I think the correct one is tokenized txt files. But then the question is how am I going to apply ppe to sentences inside the agent function and also how am I going to de-bpe and de-tokenize the translation inside the agent function? It would be really useful if you also explain how you handled this part.
Edit: If tokenized but NOT BPE applied files are given as --source and --target, I think source file can be BPEed in segment_to_units ? Similar to what has been done in En-Ja file? Instead of using sentence-encoder, there must be a way of using BPE. But I couldn't figure out how to detokenize after translation completes.
Kinda off topic but you should input detokenized files (those before applying BPE) to the simuleval. If you want to detokenize, you should implement it in units_to_segment on your own.
Just found out that you also need to input the incremental_state
in order for the policies to work properly during inference.
Update on the fix below:
def p_choose(self, query, key, key_padding_mask, incremental_state=None):
return self.p_choose_from_qk(query, key, key_padding_mask, incremental_state=incremental_state)
Already updated in the linked PR.
Hey @George0828Zhang , thanks for letting me know update on the PR!
If you want to detokenize, you should implement it in units_to_segment on your own.
I've also realized that I need to implement units_to_segment function. For the target side, there are two functions that needs to be implemented, I guess. These are units_to_segment
and update_states_write
. I implemented units_to_segment as follows:
def units_to_segment(self, unit_queue, states):
if len(unit_queue) == 0:
return None
if unit_queue[-1][-2:] =='@@':
return None
unit = "".join([item.replace("@@", "") for item in unit_queue])
states.unit_queue.target.clear()
return unit
However, I am not sure if this is correct. What do you think?
Also, I don't how update_states_write should be implemented, have you gone through these?
Could you please put out the agent code ? Thank you very very much
🐛 Bug
In the function p_choose, another function is called with the
self
as first argument:This is incorrect as
self
will be recognized as query, resulting in the error shown below (MonotonicAttention object has no attribute 'size'). This should be replaced byTo Reproduce
Steps to reproduce the behavior (always include the command you ran):
Code sample
Expected behavior
Decoder forward in inference should be error-free and not produce attribute errors.
Environment
pip
, source): sourcepython setup.py build_ext --inplace
,pip install .
Additional context