MorvanZhou / tutorials

机器学习相关教程
https://morvanzhou.github.io/tutorials
MIT License
11.85k stars 5.71k forks source link

LSTM:About data sequence in 7-RNN_Classifier_example.py 关于过程数据序列化问题 7-RNN_Classifier_example.py? #66

Open rosefun opened 6 years ago

rosefun commented 6 years ago

Now dataset X={x1,x2,x3...,xn},shape=[n,m], x1,x2,...,xn are samples of X. And label data y.shape=[n,k] If I use a time window with length of 2,then after reshape: X= tf.reshape(X,[int(n/2), 2, m]) X.shape=[n/2,m] But I have a problem in getting the cost by formula, cost_rnn = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_ , labels=y)) because both X and y have different shape.

Anybody knows how to solve this problem?


现在,有数据集X={x1,x2,x3...,xn},shape=[n,m] 其中,x1包含多个变量,shape=[m]. 比如,X=[[1,10,100],[2,20,200],[3,30,300]],可以看做X由多个样本x1,x2,...组成的。

标签样本y={y1,y2,...yn},shape=[n,k], 比如,Y=[[1,0,0],[0,1,0],[0,0,1]]。 这个LSTM如果序列化数据的话,比如说,用时间窗time_step=2, X= tf.reshape(X,[int(n/2), 2, m])

那么,序列化之后的样本,X就只有n-1 个了,shape=[n/2,m] 这样,由于维度不一样,就无法求出costcost_rnn = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_rnn, labels=y))

针对这种数据集应该怎样处理?