AbrahamSanders / seq2seq-chatbot

A sequence2sequence chatbot implementation with TensorFlow.
MIT License
99 stars 56 forks source link

2 questions #23

Open matteogabella opened 5 years ago

matteogabella commented 5 years ago

hi Abraham and first of all, thank you for your amazing job... i have a couple of question:

so i used the '_print_tensors_in_checkpointfile' to get the nodes of your model, and i obtained the list below... wich are the output node? do you think i need to pass all the tensors with 'decoder' in the path?

thank you

tensor_name: model/decoder/attention_decoder_cell/attention_layer/kernel tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_b tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_g tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_v tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/query_layer/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/kernel tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/bias tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/kernel tensor_name: model/decoder/memory_layer/kernel tensor_name: model/decoder/output_dense/bias tensor_name: model/decoder/output_dense/kernel tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/bias tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel tensor_name: model/encoder/shared_embeddings_matrix

AbrahamSanders commented 5 years ago

Hi @matteogabella

a) what's the purpose of the back up? Normally, a model is trained until some validation metric stops improving for a given number of epochs and then it stops. However, there is no good automatic validation metric for open-ended dialog quality. Metrics exist for neural machine translation such as BLEU and perplexity, but these only work well when there is a single correct way for the model to respond (such as in neural machine translation). In conversational modeling, there are many ways to correctly answer the same question, and therefore no known way to automatically measure the "correctness" of each response. Since there is no good automatic validation metric, an early stopping mechanism cannot be used. So, instead the training routine will save a full backup of the entire model at loss intervals specified in the hparams. This way a human can converse with each one and manually judge which one strikes the best balance of generalization and context sensitivity.

For example, an under-trained model might respond with "I don't know" for every single prompt, while an overtrained model might respond with a very detailed answer which is completely out of context because the question wasn't posed exactly the way it showed up in training. A human can choose the best backup point and delete the rest.

b) which are the output node? model/decoder/output_dense/... These are the weights and biases of the fully connected layer that sits on top of the decoder LSTM output. The softmax that models the probability distribution across the vocabulary uses the activations of output_dense.

Let me know if you have any more questions and thank you for your interest in the project!

matteogabella commented 5 years ago

thank you, and sorry to bother you again... but if _tensor_name: model/decoder/output_dense/bias tensor_name: model/decoder/outputdense/kernel are che output nodes, which are the INPUT nodes?

AbrahamSanders commented 5 years ago

The input node would be the shared_embeddings_matrix, since the model input is a sequence of embedding indices which is converted to a sequence of word vectors using the embedding lookup function. This sequence of word vectors is what is then fed into the encoder RNN's bidirectional cells.