Open SMRN opened 7 years ago
Hi, SMRN,
This code is a little complicated and focuses on character representation, you could check this file for generating the word vector. If you are interested in Ct or sentence vector, it is better to check the origin code which is much simpler. This pull request is also helpful. Thanks.
Hi SwordYork, Thank you. according to your paper the encoder makes a sentence vector... I ran your trained NMT(en-fr). please tell the name of that variable in your source code. thanks!
It will generate a context vector (C_t in the paper) that corresponding to this variable in the code. Note that, I think C_t should not be called a sentence vector, it is different to this paper. Again, you could refer to this pull request for how to extract these vectors. Thanks.
Thanks for your response. would you please tell me which variable is Ct (context vector)? I think this variable make in sampling.py Do you agree?
Ct is not explicitly used in sampling.py, it is handled by beamsearch. You could find it in blocks.
According to your paper , the output of the Bidirectional RNN sentence Encoder is Ct. I want to know its name on your source code.
@SMRN I have told you that it is related to this line, https://github.com/SwordYork/DCNMT/blob/master/model.py#L638 , that is next_glimpses['weighted_averages']
. is it clear?
If it is still unclear, I would like to write the code when I can use a computer.
thank you. Good news! :) when you can write it?
Hi SwordYork, would you please write it?
@SMRN Sorry for the late reply, it takes me 2 days from the university to my home. You could add the following code after this line.
_, input_dict = self.build_input_dict(numpy.asarray(seq), self.vocab)
sampling_fn = self.model.get_theano_function()
sfn = sampling_fn(**input_dict)
outputs = list(sfn[1].flatten())
# convert idx to words
try:
sample_length = outputs.index(self.trg_vocab['</S>'])
except ValueError:
sample_length = len(seq)
context_vt = sfn[2]
print("Input : ", self._idx_to_word(line[0], self.src_ivocab))
print("Sample: ", self._idx_to_word(outputs[:sample_length], self.trg_ivocab))
print("Context vector: \n")
reshaped_context_vt = [list(vt) for vt in list(context_vt.reshape(context_vt.shape[0], context_vt.shape[2]))]
print("===================== \n")
for i, vt in enumerate(reshaped_context_vt):
print(i, vt)
print("===================== \n")
Note, it will output many many lines and running testing.py will become very slow.
Thank you. I put it there But unfortunately it raises some errors:
INFO: sampling: Started Test:
Input: This is perfectly illustrated by the UKIP numbties banning people with HIV.
Sample: "
Traceback (most recent call last): File"/home/prg/en-fr_model/testing.py", line 111, in module main(configuration, get_test_stream(**configuration), saveto)
File"/home/prg/en-fr_model/testing.py", line 97, in main main_loop._run_extensions('before_training')
File"/home/prg/anaconda3/lib/python3.5/site-packages/blocks/main_loop.py", line 263, in _run_extensions extension.dispatch(CallbackName(method_name), *args)
File"/home/prg/anaconda3/lib/python3.5/site-packages/blocks/extensions/init.py", line 68, in dispatch getattr(self, str(callback_name))(*args)
File"/home/prg/en-fr_model/sampling", line 515, in before_training self._evaluate_model()
File"/home/prg/en-fr_model/sampling", line 556, in _evaluate_model list(context_vt.reshape(context_vt.shape[0], context_vt.shape[2]))]
IndexError: tuple index out of range Use of uninitialized value $lenght_reference in numeric eq (==) at ./data/multi-blue.perl line 148.
@SMRN Sorry, I don't know what happens to your code, do you use the latest code?
yes. i have downloaded the latest version. it worked well before adding the code!
is it works with the new code on your computer?! you didn't upload iterations_state.pkl file. is it important or not?
Hi SwordYork, Thanks for sharing your code. i ran testing.py. Which part of your code generate the Ct or sentence vector? Thanks for your help