Test_GRU.py 运行问题

Mariobai commented 7 years ago

我已经pull了最新的项目了，可是但我在运行test_GRU.py还是出现了下面的这个问题 W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/gates/biases not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/weights not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/gates/weights not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/candidate/weights not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/gates/biases not found in checkpoint W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/GRU_FORWARD/multi_rnn_cell/cell_0/gru_cell/gates/weights not found in checkpoint

tensorflow.python.framework.errors_impl.NotFoundError: Key model/GRU_BACKWARD/multi_rnn_cell/cell_0/gru_cell/candidate/biases not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

我的配置： Python：ananconda python3.5 tensorflow: 1.0.0 这是我出错的问题。请问一下，这个是什么问题。还有就是，问什么我自己训练不了模型，只能使用你的模型，我重新训练模型train_GRU.py并不能更新你之前的那个模型。

crownpku commented 7 years ago

可以post出来你训练模型train_GRU.py时的输出吗？我怀疑是tensorflow版本的问题，可以尝试tf1.2.0再试下。

Mariobai commented 7 years ago

/Users/bai/anaconda3/bin/python /Users/bai/python/pythonex/relationex/train_GRU.py reading wordembedding reading training data W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-09-22T10:45:52.839965: step 50, softmax_loss 82.7629, acc 0.5 2017-09-22T10:46:23.717158: step 100, softmax_loss 77.9403, acc 0.48 2017-09-22T10:46:54.196828: step 150, softmax_loss 51.8013, acc 0.62

Process finished with exit code 0

我现在的train_GRU.py的模型输出是这样的。没有什么错误，可是就是没有办法保存模型

crownpku commented 7 years ago

train_gru.py line 125:

if current_step > 8000 and current_step % 100 == 0:
                        print('saving model')
                        path = saver.save(sess, save_path + 'ATT_GRU_model', global_step=current_step)
                        tempstr = 'have saved model to ' + path
                        print(tempstr)

为了节省不必要的空间，现在的设定是8000以后的step才会每过100个step存储一次模型。

Mariobai commented 7 years ago

那我这个地怎么设置没10个step就保存一次模型呢？

crownpku commented 7 years ago

if current_step % 10 == 0:
                        print('saving model')
                        path = saver.save(sess, save_path + 'ATT_GRU_model', global_step=current_step)
                        tempstr = 'have saved model to ' + path
                        print(tempstr)

Mariobai commented 7 years ago

还想请问一个问题，隔10个步长就保存一个词，我怎么覆盖之前保存的模型啊？这里面保存的模型太多了啊？

crownpku commented 7 years ago

备份或者删除掉之前的模型，再跑新的训练。或者修改saver.save中间的模型名字。

太早开始存储模型，早期的模型效果非常差，没有用处的；这就是为什么我设置8000以后才会存储模型。 8000以后的每隔100存储的模型，可以互相比较效果取最好的，因为有可能太后期的模型又会overfit.

Mariobai commented 7 years ago

谢谢，现在已经可以运行了。非常感谢。你对地理方面的实体关系抽取有什么研究吗？现在我的问题是不知道数据集的构造是什么样的？也是和这个项目中的一样：赵玉芳葛淑珍父母九九年前后时，女儿赵玉芳被大连外国语学院录取，而葛淑珍就再度成为陪读妈妈！这样的吗？还是我需要做成别的形式，我才刚刚开始研究，你对这方面有什么建议吗？

crownpku commented 7 years ago

对地理领域并不了解... 关键是看数据长什么样，你有地理方面的数据，可以post些sample在这里来讨论。不然我也是一头雾水。

Mariobai commented 7 years ago

哦哦。。数据我们准备就是从网上爬取一些新闻报道，然后也是要处理成这个项目中的训练集这种格式吗？现在有点迷。

Mariobai commented 7 years ago

您好，我想请问一下，你的数据是怎么以词向量的形式输入到神经网络中的啊，是使用预训练好的词向量还是使用one-hot随机生成的词向量啊？

crownpku commented 7 years ago

是预训练好的中文字向量

Mariobai commented 7 years ago

是直接训练(train.txt)里面这种类型的数据为词向量的吗？-->朱时茂陈佩斯合作《水与火的缠绵》《低头不见抬头见》《天剑群侠》小品陈佩斯与朱时茂1984年《吃面条》合作者：陈佩斯聽1985年《拍电影》合

还是单独的使用别的数据来训练为词向量，还有就是在输入到神经网络的时候，是输入一句话，还是输入什么啊？

crownpku commented 7 years ago

中文字向量是在中文wikipedia上训练的。输入的就是后面那一句话的每个字向量拼接在一起啊。我以为我的blog已经写得很清楚了。。。

Mariobai commented 7 years ago

字向量？就是每个单独的字？为什么不是词向量啊？word2vec不是词向量吗？一个个的字，流程是这样的对吧-->首先，找到句子，不用进行分词和词性的标注，直接按照字去训练好的字向量中去找每个字的子向量是什么。(字向量怎么训练的？) 实在不好意思，问题有点多

crownpku commented 7 years ago

对英文来说，词能有空格分开。中文天生粘在一起，可以使用一些分词工具分词之后再去word2vec跑词向量，但会带来分词的错误；字向量就是把每个中文字都分开当做一个"词"，然后一样用word2vec去训练，只不过最后结果的每个向量代表的是单个中文字了。

Mariobai commented 7 years ago

哦哦。。。这样说的话就是说我需要把一句话里面的每个字用空格隔开，然后用这个字来训练字向量。还有就是我们输入的时候给一句话，可是这句话的实体之间的类型我们并没有给出啊？那我的网络怎么知道什么类型才是正确的呢？

crownpku commented 7 years ago

整个网络最尾的output，通过和label计算loss，反馈回来训练整个网络。

Mariobai commented 7 years ago

label?在哪里？就是训练集里面的的两个实体中间对应的关系吗？

Mariobai commented 7 years ago

您好，如果说我要在您的网络中加入一个验证集，这个在哪里加啊？看您的网络只有训练集合测试集呢

crownpku commented 7 years ago

label即是实体间的关系；添加验证集不需要改网络结构，需要修改训练部分train_GRU.py中的代码逻辑。

Mariobai commented 7 years ago

但我的句子输入的长度不一致的时候是怎么处理的啊？我的每个输入的句子的长度不可能是一致的啊？而且程序注释有点少啊。。

Mariobai commented 7 years ago

我在训练的过程中出现 saving model have saved model to ./model/ATT_GRU_model-1000 2017-11-07T16:51:29.607642: step 1050, softmax_loss 0.0269104, acc 1 2017-11-07T16:54:02.656800: step 1100, softmax_loss 0.0240574, acc 1

我的准确度达到了1，这明显是过拟合了啊。您在处理这个的时候是怎么弄的啊？

KobeLA24 commented 6 years ago

请问你是如何解决训练问题的？我也是当step等于150时就停止了，没有保存模型，怎么解决的啊？

KobeLA24 commented 6 years ago

我在train_GRU.py中使用print函数发现如果使用作者提供的训练集,在默认num_epoch=10，big_num=50情况下，只会进行190多个train_step,为什么作者还要设置step>8000才保存model啊，这是怎么回事呢？难怪每次step=150程序就终止了。

crownpku commented 6 years ago

@KobeLA24 项目中的training data仅仅是示例，远远不够训练出一个可用的模型。step>8000是在足够多的训练集上训练时用到的参数。

KobeLA24 commented 6 years ago

好的，明白了，谢谢啦

crownpku / Information-Extraction-Chinese

Test_GRU.py 运行问题 #8