domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

hidden vector #30

Closed chenting0324 closed 7 years ago

chenting0324 commented 7 years ago

Hello!I used the model on other dataset, so it's word error rate is a little bit high, does it a normal phenomenon. In addittion,I want to know How Can I get the hidden vectors? I want to use these hidden vectors in otherwhere. Thank you very much!

AMairesse commented 7 years ago

Hi, the error rate could in fact be higher on another dataset. The pre-trained model was trained over 3 different dataset simultaneously so it should be able to proceed on various data but there is always a better performance on test datasets corresponding to the ones used for training. About the hidden vectors you can in fact get them directly from the tensorflow checkpoint file. You should try using : https://gist.github.com/batzner/7c24802dd9c5e15870b4b56e22135c96 against the checkpoint file. It's useful to list variables in a checkpoint and rename some of them to match your naming against your own model. I'm interesting in your WER and CER results against the other dataset if you still have it. I'm trying to obtain a better result by fine-tuning hyperparameters but don't get any significant improvement for the moment.

chenting0324 commented 7 years ago

Hi, I will use the model on other dataset after some days, I will give you WER and CER results then. Thank you for your answer. I will have a try, Is the tensorflow checkpoint file in your trained models/acoustic_model/english/checkpoint or mean other file? Thank you very much!!!

chenting0324 commented 7 years ago

Hello, I use :+1: https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/python/tools/inspect_checkpoint.py and I got the result as follows:

tensor_name: rnn/multi_rnn_cell/cell_2/basic_lstm_cell/biases [-0.0313677 -0.00658913 -0.01896443 ..., -0.00783602 0.00158059 0.00172835] tensor_name: rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases [ 0.0001673 -0.05007814 0.00368076 ..., -0.01218077 0.01428195 0.02477961] tensor_name: rnn/multi_rnn_cell/cell_0/basic_lstm_cell/biases [-0.00722365 -0.04710276 0.02127041 ..., -0.01892216 -0.06573219 -0.01893883] tensor_name: global_step 9500 tensor_name: Input_Layer/input_b [-0.11696832 0.00067524 -0.01014187 ..., -0.09762439 0.04245738 0.01483767] tensor_name: learning_rate 3.267e-05 tensor_name: rnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights [[-0.01136321 -0.00384914 -0.04242354 ..., -0.01532782 0.06549271 0.00346065] [-0.05127605 -0.02392749 -0.01478572 ..., -0.01351318 0.04717604 0.05263193] [ 0.0404567 -0.01764796 0.0138403 ..., -0.01546826 0.01645955 0.00134034] ..., [-0.04345571 -0.05136441 -0.03100013 ..., 0.0047017 -0.0306399 0.03517035] [-0.00018307 0.0137899 0.00612506 ..., 0.03813621 -0.05160636 -0.0202991 ] [-0.00455729 0.03632646 -0.01809452 ..., -0.00676679 0.00376569 0.10562224]] tensor_name: Input_Layer/input_w [[-0.03182084 -0.01579602 0.00075297 ..., 0.02241369 -0.00096865 0.01190329] [-0.0249869 0.01131274 -0.0337288 ..., 0.00583539 0.00882952 -0.03973912] [-0.02620839 -0.0410029 0.00613309 ..., 0.00256835 -0.02844046 0.02862929] ..., [-0.03583897 0.00582753 -0.07601061 ..., -0.03153202 -0.00913378 0.05261716] [-0.02297934 0.04844452 -0.02955636 ..., 0.02908967 0.00248094 0.01559274] [ 0.01304239 0.04316467 -0.07282739 ..., 0.01061043 0.0257307 0.11344201]] tensor_name: Output_layer/output_w [[ 0.05101303 0.18263206 0.18622988 ..., -0.03892048 -0.06715754 -0.06261685] [ 0.00886919 0.04810521 -0.17560247 ..., -0.21467008 0.05993012 0.0771218 ] [ 0.03629687 -0.13104998 -0.08590779 ..., -0.08579271 0.05087085 0.05212956] ..., [-0.06602976 0.13366954 0.03888952 ..., -0.06018423 -0.02374146 0.00839281] [ 0.09960367 0.17472173 0.08439461 ..., 0.15567915 0.0323404 -0.03471686] [ 0.1105698 0.18649958 -0.00129581 ..., 0.16154449 0.03485731 -0.17772995]] tensor_name: rnn/multi_rnn_cell/cell_2/basic_lstm_cell/weights [[-0.0356952 0.01836566 -0.00732613 ..., 0.046164 -0.06747929 0.05385 ] [ 0.08403415 0.03950982 0.00801035 ..., 0.02327672 0.01805933 -0.0331181 ] [ 0.0084149 -0.02517631 0.00857453 ..., -0.05464114 0.0043622 0.03270212] ..., [-0.0187362 -0.07235921 0.06286826 ..., -0.01012454 0.02534539 0.02963923] [ 0.07454605 0.031953 -0.04824256 ..., 0.02892545 -0.01999683 0.01131981] [ 0.01235672 0.02575596 -0.03723545 ..., -0.0870229 -0.04768194 -0.15054134]] tensor_name: rnn/multi_rnn_cell/cell_1/basic_lstm_cell/weights [[ 0.05117989 -0.05484957 0.00072761 ..., 0.07506247 -0.0041365 0.00778818] [ 0.08750629 -0.01551697 0.00817819 ..., 0.04885173 -0.03843196 -0.04395888] [-0.0075606 0.01272487 0.04914607 ..., 0.04965738 0.01223274 0.01021633] ..., [ 0.00674141 0.00118298 0.03366723 ..., -0.02298987 -0.0515626 0.03328852] [-0.07422537 -0.04096507 -0.00999226 ..., 0.02797294 -0.02184403 -0.05488605] [ 0.03184707 -0.06398494 -0.05414595 ..., -0.04157395 0.04862571 0.00409058]]

but now I'm confused that what are the hidden vectors?Can you help me? Thanks a lot!!!

AMairesse commented 7 years ago

Hi,

The checkpoint file and the 3 acousticmodel.ckpt.* files are the 4 files used by tensorflow to save a checkpoint.

In the list you obtain you have multiples tensor :

So if you want to use the network you should use every variables except the learning rate and the global_step counter. You can also use only a part of those variables as a starter for a larger network. I don't know if it would accelerate the training... maybe...

AMairesse commented 7 years ago

Note that output_b is missing in the checkpoint because of a bug in the saving method. This is fixed in the dev branch but the actual pre-trained network won't load with the software in that branch because of it.

chenting0324 commented 7 years ago

Hi, Thank you for your detailed answer.In fact,I only want to extract the hidden vectors from the rnn,but in my list,it doesn't include the hidden vectors,right? the code:self.saver = tf.train.Saver(save_list) will save all variables, but hidden vectors didn't be saved,right? I want to use the hidden vectors in otherwhere,So can I get them from the tensorflow checkpoint file?but I didn't see them in the checkpoint file. (So sad!!!) Thank you!!!!

chenting0324 commented 7 years ago

Maybe the hidden vectors are defined in this code: self.hidden_state = tf.Variable(tf.zeros((num_layers, 2, batch_size, hidden_size)), trainable=False)?

AMairesse commented 7 years ago

ok, I didn't understand it before, sorry. The hidden vector is in fact stored into the self.hidden_state variable. It's not saved into the checkpoint but your could have it with a minor change, just add it's name in the definition of save_list in AcousticModel.py :

save_list = [var for var in tf.global_variables() if (var.name.find('/input_w:0') != -1) or (var.name.find('/input_b:0') != -1) or (var.name.find('/output_w:0') != -1) or (var.name.find('/output_w:0') != -1) or (var.name.find('global_step:0') != -1) or (var.name.find('learning_rate:0') != -1) or (var.name.find('/weights:0') != -1) or (var.name.find('/biases:0') != -1)]

You can get the name by looking into the tensorboard's graph.

chenting0324 commented 7 years ago

Thank you very much! I will have a try!!!

chenting0324 commented 7 years ago

Hi, I added the self.hidden_state in the save_list,just as follows: save_list = [var for var in tf.global_variables() if (var.name.find('/input_w:0') != -1) or (var.name.find('/input_b:0') != -1) or (var.name.find('/output_w:0') != -1) or (var.name.find('/output_w:0') != -1) or (var.name.find('global_step:0') != -1) or (var.name.find('learning_rate:0') != -1) or (var.name.find('/weights:0') != -1) or (var.name.find('/biases:0') != -1) or (var.name.find('/self.hidden_state:0') != -1)] but it didn't work, the result is same as before: All Variables: b'Input_Layer/input_b (DT_FLOAT) [1024]\nInput_Layer/input_w (DT_FLOAT) [120,1024]\nOutput_layer/output_w (DT_FLOAT) [1024,80]\nglobal_step (DT_INT32) []\nlearning_rate (DT_FLOAT) []\nrnn/multi_rnn_cell/cell_0/basic_lstm_cell/biases (DT_FLOAT) [4096]\nrnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights (DT_FLOAT) [2048,4096]\nrnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases (DT_FLOAT) [4096]\nrnn/multi_rnn_cell/cell_1/basic_lstm_cell/weights (DT_FLOAT) [2048,4096]\nrnn/multi_rnn_cell/cell_2/basic_lstm_cell/biases (DT_FLOAT) [4096]\nrnn/multi_rnn_cell/cell_2/basic_lstm_cell/weights (DT_FLOAT) [2048,4096]\n' it didn't have th variable"self.hidden_state",it's not saved into the checkpoint!!howCan I solve this problem? Thank you very much!!!

AMairesse commented 7 years ago

Hi,

The name of the variable you have to add in the save_list is the tensorflow name. For this variable it's "Hidden_state/hidden_state:0", so you should put : (var.name.find('/hidden_state:0') != -1) and it should work.

If you want you can also get rid of the if statement in the list construction, this way all tensorflow's variables will be saved in the checkpoint file, including the hidden_state.

chenting0324 commented 7 years ago

Hi, I added it to the save_list as follows: save_list = [var for var in tf.global_variables() if (var.name.find('/input_w:0') != -1) or (var.name.find('/input_b:0') != -1) or (var.name.find('/output_w:0') != -1) or (var.name.find('/output_w:0') != -1) or (var.name.find('global_step:0') != -1) or (var.name.find('learning_rate:0') != -1) or (var.name.find('/weights:0') != -1) or (var.name.find('/biases:0') != -1) or (var.name.find('/hidden_state:0') != -1)] andthe codes that I get variables from checkpoint file are as follows: from tensorflow.python import pywrap_tensorflow import os checkpoint_dir="trained_models/acoustic_model/english" checkpoint_path = os.path.join(checkpoint_dir, "acousticmodel.ckpt")
reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path) var_to_shape_map = reader.get_variable_to_shape_map() for key in var_to_shape_map: print("tensor_name: ", key) print(reader.get_tensor(key))

and the result is: tensor_name: rnn/multi_rnn_cell/cell_1/basic_lstm_cell/biases [ 0.0001673 -0.05007814 0.00368076 ..., -0.01218077 0.01428195 0.02477961] tensor_name: rnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights [[-0.01136321 -0.00384914 -0.04242354 ..., -0.01532782 0.06549271 0.00346065] [-0.05127605 -0.02392749 -0.01478572 ..., -0.01351318 0.04717604 0.05263193] [ 0.0404567 -0.01764796 0.0138403 ..., -0.01546826 0.01645955 0.00134034] ..., [-0.04345571 -0.05136441 -0.03100013 ..., 0.0047017 -0.0306399 0.03517035] [-0.00018307 0.0137899 0.00612506 ..., 0.03813621 -0.05160636 -0.0202991 ] [-0.00455729 0.03632646 -0.01809452 ..., -0.00676679 0.00376569 0.10562224]] tensor_name: Input_Layer/input_w [[-0.03182084 -0.01579602 0.00075297 ..., 0.02241369 -0.00096865 0.01190329] [-0.0249869 0.01131274 -0.0337288 ..., 0.00583539 0.00882952 -0.03973912] [-0.02620839 -0.0410029 0.00613309 ..., 0.00256835 -0.02844046 0.02862929] ..., [-0.03583897 0.00582753 -0.07601061 ..., -0.03153202 -0.00913378 0.05261716] [-0.02297934 0.04844452 -0.02955636 ..., 0.02908967 0.00248094 0.01559274] [ 0.01304239 0.04316467 -0.07282739 ..., 0.01061043 0.0257307 0.11344201]] tensor_name: global_step 9500 tensor_name: rnn/multi_rnn_cell/cell_1/basic_lstm_cell/weights [[ 0.05117989 -0.05484957 0.00072761 ..., 0.07506247 -0.0041365 0.00778818] [ 0.08750629 -0.01551697 0.00817819 ..., 0.04885173 -0.03843196 -0.04395888] [-0.0075606 0.01272487 0.04914607 ..., 0.04965738 0.01223274 0.01021633] ..., [ 0.00674141 0.00118298 0.03366723 ..., -0.02298987 -0.0515626 0.03328852] [-0.07422537 -0.04096507 -0.00999226 ..., 0.02797294 -0.02184403 -0.05488605] [ 0.03184707 -0.06398494 -0.05414595 ..., -0.04157395 0.04862571 0.00409058]] tensor_name: rnn/multi_rnn_cell/cell_2/basic_lstm_cell/weights [[-0.0356952 0.01836566 -0.00732613 ..., 0.046164 -0.06747929 0.05385 ] [ 0.08403415 0.03950982 0.00801035 ..., 0.02327672 0.01805933 -0.0331181 ] [ 0.0084149 -0.02517631 0.00857453 ..., -0.05464114 0.0043622 0.03270212] ..., [-0.0187362 -0.07235921 0.06286826 ..., -0.01012454 0.02534539 0.02963923] [ 0.07454605 0.031953 -0.04824256 ..., 0.02892545 -0.01999683 0.01131981] [ 0.01235672 0.02575596 -0.03723545 ..., -0.0870229 -0.04768194 -0.15054134]] tensor_name: rnn/multi_rnn_cell/cell_2/basic_lstm_cell/biases [-0.0313677 -0.00658913 -0.01896443 ..., -0.00783602 0.00158059 0.00172835] tensor_name: Output_layer/output_w [[ 0.05101303 0.18263206 0.18622988 ..., -0.03892048 -0.06715754 -0.06261685] [ 0.00886919 0.04810521 -0.17560247 ..., -0.21467008 0.05993012 0.0771218 ] [ 0.03629687 -0.13104998 -0.08590779 ..., -0.08579271 0.05087085 0.05212956] ..., [-0.06602976 0.13366954 0.03888952 ..., -0.06018423 -0.02374146 0.00839281] [ 0.09960367 0.17472173 0.08439461 ..., 0.15567915 0.0323404 -0.03471686] [ 0.1105698 0.18649958 -0.00129581 ..., 0.16154449 0.03485731 -0.17772995]] tensor_name: rnn/multi_rnn_cell/cell_0/basic_lstm_cell/biases [-0.00722365 -0.04710276 0.02127041 ..., -0.01892216 -0.06573219 -0.01893883] tensor_name: Input_Layer/input_b [-0.11696832 0.00067524 -0.01014187 ..., -0.09762439 0.04245738 0.01483767] tensor_name: learning_rate 3.267e-05

it still doesn't include the hidden_state, is there anywhere need to be modified? Thank you very much!!!(I feel sorry for my bother)

AMairesse commented 7 years ago

The hidden_state is not currently in the checkpoint file you are using. The modification done to the save_list is just to allow it to be saved when creating a new checkpoint. Now you have to train the network to have a checkpoint file containing the hidden_state. Are you sure that you are looking for the hidden_state ? It's not very useful, it only keep the state of the network from one slice of audio sample to the next. At the end of the training the content of the hidden state is directly inherited from the last files processed, which can be any files.

chenting0324 commented 7 years ago

Hi,I want to use the hidden vectors to train word2vec ,it's initial vectors are initialized randomly,I want to use the hidden vectors to initialize, So I need to extract the hidden vectors from hidden_state,the hidden_state include hidden vectors ,right?maybe is there some other good ways?I'm a little bit confused! Thank you very much!

chenting0324 commented 7 years ago

You mean I have to train the network that is I should use“python stt.py --train” instead of “python stt.py --file”? Thanks a lot!

AMairesse commented 7 years ago

Hi, yes, for training you will need to launch “python stt.py --train” but you also need to have training data in the data directory and a corresponding config.ini file. About using the hidden_state for the word2vec network beware that the AcousticModel use a frame of size 0.025 s and move only of 0.01s between each steps, so there is an overlap between each steps.

chenting0324 commented 7 years ago

That sounds a little difficult, I will have a try,Thank you!

AMairesse commented 7 years ago

Closing, please reopen if you have any other related questions.