Open matteogabella opened 5 years ago
Hi @matteogabella
a) what's the purpose of the back up? Normally, a model is trained until some validation metric stops improving for a given number of epochs and then it stops. However, there is no good automatic validation metric for open-ended dialog quality. Metrics exist for neural machine translation such as BLEU and perplexity, but these only work well when there is a single correct way for the model to respond (such as in neural machine translation). In conversational modeling, there are many ways to correctly answer the same question, and therefore no known way to automatically measure the "correctness" of each response. Since there is no good automatic validation metric, an early stopping mechanism cannot be used. So, instead the training routine will save a full backup of the entire model at loss intervals specified in the hparams. This way a human can converse with each one and manually judge which one strikes the best balance of generalization and context sensitivity.
For example, an under-trained model might respond with "I don't know" for every single prompt, while an overtrained model might respond with a very detailed answer which is completely out of context because the question wasn't posed exactly the way it showed up in training. A human can choose the best backup point and delete the rest.
b) which are the output node? model/decoder/output_dense/... These are the weights and biases of the fully connected layer that sits on top of the decoder LSTM output. The softmax that models the probability distribution across the vocabulary uses the activations of output_dense.
Let me know if you have any more questions and thank you for your interest in the project!
thank you, and sorry to bother you again... but if _tensor_name: model/decoder/output_dense/bias tensor_name: model/decoder/outputdense/kernel are che output nodes, which are the INPUT nodes?
The input node would be the shared_embeddings_matrix, since the model input is a sequence of embedding indices which is converted to a sequence of word vectors using the embedding lookup function. This sequence of word vectors is what is then fed into the encoder RNN's bidirectional cells.
hi Abraham and first of all, thank you for your amazing job... i have a couple of question:
what's the purpose of the back up? why in some particular cases you perform a backup? is it necessary in case of resuming the training or is it just a precaution? why someone would have to use the back up later?
in order to convert the model in tflite format (to play a bit in mobile environment) i need to freeze the model... in most of the guides i read that to do that, starting from checkpoint file, i need to pass the output nodes to '_convert_variables_toconstants' function ...
so i used the '_print_tensors_in_checkpointfile' to get the nodes of your model, and i obtained the list below... wich are the output node? do you think i need to pass all the tensors with 'decoder' in the path?
thank you
tensor_name: model/decoder/attention_decoder_cell/attention_layer/kernel tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_b tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_g tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_v tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/query_layer/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/kernel tensor_name: model/decoder/memory_layer/kernel tensor_name: model/decoder/output_dense/bias tensor_name: model/decoder/output_dense/kernel tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel tensor_name: model/encoder/shared_embeddings_matrix