gaoyuankidult / einstein

5 stars 3 forks source link

A bug to solve #2

Open i3esn0w opened 8 years ago

i3esn0w commented 8 years ago

Traceback (most recent call last): File "clock_gated_rnn.py", line 63, in model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary") File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 287, in compile self.y_train = self.get_output(train=True) File "/usr/local/lib/python2.7/dist-packages/keras/layers/containers.py", line 51, in get_output return self.layers[-1].get_output(train) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 223, in get_output X = self.get_input(train) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 31, in get_input return self.previous.get_output(train=train) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 341, in get_output X = self.get_input(train) File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 31, in get_input return self.previous.get_output(train=train) File "build/bdist.linux-x86_64/egg/einstein/layers/recurrent.py", line 714, in get_output File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 745, in scan condition, outputs, updates = scan_utils.get_updates_and_outputs(fn(*args)) File "build/bdist.linux-x86_64/egg/einstein/layers/recurrent.py", line 694, in _step AttributeError: 'module' object has no attribute 'ifelse'

i use the code that you implement in example , but it can not run. And I check the code .there is no error , how i fix it ?

gaoyuankidult commented 8 years ago

Dear i3esn0w

Thanks for your input. However, this repo uses an older version of Keras and is outdated. Now it is only used for storing my previous experiment files.

I changed description to be

This repo depends on an older version of keras and is outdated. Now it only serves as a place to store my previous experiments. One may face a lot of errors trying to run these codes.

ClockworkSGU is a clockwork version of my previous model SGU. I don not know what would you like to do with it.

If you would like to port Clockwork to some other libraries, you can look at this class (link). I think anyone with experiences with older keras should know how to port it.

If you would like to check the code of SGU and DSGU, please have a look at this link. Actually both SGU and DSGU have some problems with multi-class classification (it only gives best class but does not provide a probabilistic distribution over all classes).

Please do not hesitate to tell me what would you like to do with the code, so I can help you with it.

white54503 commented 8 years ago

I've been using DSGU w/ softmax l2_activation for multi-label classification in an RL-setting, with stellar results. GRUs comprise the hidden state, DSGU does the output. Network inputs range (-inf, inf). Do you see any reason this shouldn't be a viable approach? Thanks, and great work... I was sad to see the arxiv paper come down.

gaoyuankidult commented 8 years ago

Dear white54503

Did you get a good result ?

As mentioned in the paper, DSGU uses a sigmoid function as an activation function. Combining it with categorical cross entropy can make the classification right but output does not follow a probabilistic distribution. This is not useful in many cases (probably more experiments are needed if somebody want to continue investigation on multiplicative gate). That is why I withdrawn my paper.

Using DSGU + softmax may not be a good choice as DSGU+softmax did not work on my experiments (DSGU + sigmoid worked).

Actually, I suggest you look at this paper. By using batch normalization on LSTM, they managed to reach 99% on MNIST dataset (with softmax output layer), which is at least higher than my result. Keras recently has a pr about it (link). However, I never implemented batch normalization on my model.

white54503 commented 8 years ago

Performance is quite good where other architectures have struggled mightily... I'm using DSGU in an asynchronous actor-critic setup similar to http://arxiv.org/pdf/1602.01783v1.pdf; loss is the negative of actor's advantage vs critic baseline. Classification by the max of the network's output yields control vectors; there is no probabilistic interpretation of the class labels. I'll test sigmoid vs. softmax and report results in a week or two.

i3esn0w commented 8 years ago

First of all, thanks for your reply.I use it to sentiment analyse,and I need tri-class classification.My old code are base keras 1.0 . So I will consider to port it to keras 1.0.

gaoyuankidult commented 8 years ago

@white54503 That is quite interesting. My original purpose of designing this network is for control problem as well, which does not require any probabilistic distribution. I am really interested in how did you use softmax+DSGU for your problem. I will read the paper and try to understand your system. Let us continue discussion latter. btw, maybe we can discuss here.

gaoyuankidult commented 8 years ago

@i3esn0w Which one would you like to port ? DSGU or ClockworkRNN ? If you would like to port DSGU, there is a ported version here. If you would like to port ClockworkRNN, you should look at this class.

如果你懂中文也可以中文说。

i3esn0w commented 8 years ago

好吧……我想试试ClockworkRNN

gaoyuankidult commented 8 years ago

@i3esn0w 我最近比较忙,可能没时间完成转成keras 1.0 的工作,不过我的step 函数 已经写recurrent.py里面了。

我建议你写一个支持Keras 1.0的ClockworkRNN函数,这样大家也会用的。

不过Clockwork RNN 非常慢,我不是很推荐用这个做sentiment analysis。