Closed hanfeiyu closed 4 years ago
Second update.
After digging into the codebases, I suspect that there is something wrong with the interaction between GPT2 model setup and "dynamic_decode".
Defferent to "train_greedy", which just simply passes inputs to the "self_attention_layer" and get outputs, "infer_sample" and "infer_greedy" require dynamically decoding using "dynamic_decode", and this is where the error happens.
Maybe "dynamic_decode" requires creating new trainable variables when decoding, but GPT2 template has already been set up before calling "dynamic_decode", eventually results in
ValueError: Trainable variable created when calling a template after the first time, perhaps you used tf.Variable when you meant tf.get_variable
I'm still not able to get it work though issue is pinpointed (hopefully), and it seems that no one is in this community...
Final update.
Now it's finally working.
I overarched the GPT2Decoder
codebases bypassing the initialization of super class ModuleBase
and moving all the funtions/properties I need from ModuleBase
into my own GPT2 module.
ModuleBase
initialization will call tf.make_template
to make the template before real decoding with dynamic_decode
, which will then kill any possibility of creating new trainable variables once the template is made. Not quite sure why TransformerDecoder
still works even after initializing the its super class ModuleBase
.
Another thing worth to be mentioned here is that I had to change the following names of the tensor_map when calling _init_from_checkpoint
to load the weights from GPT2 cache.
"ln_1/b": 'layer_{}/beta',
"ln_1/g": 'layer_{}/gamma',
"ln_2/b": 'layer_{}/past_poswise_ln/beta',
"ln_2/g": 'layer_{}/past_poswise_ln/gamma',
If not doing this, GPT2 cannot map its names of tensor to the weights from ckpt.model
correctly. I had no choice but to adjust the default naming since I gave up on inheriting ModuleBase
.
Now I'm closing this issue, I'm glad to see it's working even though nobody helped me out lol. Texar is still a great library for NLP anyway.
Hello there,
I encountered some issues when I was using GPT2 decoder for generation.
train_greedy
worked well butinfer_greedy
andinfer_sample
were always throwing me errors like:Then I tested GPT2 decoder using
gpt2_decoder_test.py
, which is provided along withgpt2_decoder.py
under the same foldermodules/decoder
.I modified the codes to test three different decoding strategies like this:
The result was that "train_greedy" passed, both "infer_greedy" and "infer_sample" failed the unittest, errors were:
Here is my environment:
I don't know if it's just only me suffering from this issue, or anyone else is facing the same problem as well. Can somebody help check this issue out? Thanks.