Open chqsark opened 8 years ago
can someone take a look? thanks!
Have you solved the problem? I have the same problem as you.
this is weird, see that you have two outputs with shape (?, 64, 512) and (9, 2, 512). They should be (64, 5, 512) and (64, 9, 512). But K.rnn is messing the shapes up. I'll check what is going on.
In case you want to investigate as well, here is where the bug should be happening https://github.com/commaai/research/blob/master/models/layers.py#L359-L374
What is your keras version by the way?
So, for the now the only place that I see could be cause this problem is the consume_less
RNN parameter in Keras. Try changing https://github.com/commaai/research/blob/master/models/transition.py#L41-L42 to:
model.add(DreamyRNN(output_dim=z_dim, output_length=out_leng-1, return_sequences=True,
activation="tanh", consume_less="not_cpu", batch_input_shape=(batch_size, time, z_dim)))
Unfortunately I can't reproduce your bug right now. But I'll give you more information as soon as I get an opportunity.
@EderSantana Thanks a lot for the response. I've tried keras 1.0.6 and 1.0.8, tensorflow 0.9, and 0.10. All gave the same error. I still got the error after changing transition.py as you suggested.
I realized that comma.ai has a fork of keras. Should I use that instead of the original one? Or any specific branch of keras?
no I tried this code on Keras public release. I think the problem is with the recurrent layer consume_less
parameter. But I can't test it right now :(
I just tried 'cpu', 'gpu', 'mem' for consume_less parameter. No luck :(
My server.py output is like this guan.wang@Z440SJ-243:~/ml/comma/research$ ./server.py --time 60 --batch 64 INFO:main:server started INFO:dask_generator:Loading 9 hdf5 buckets. x 52722 | t 263583 | f 52722 x 58993 | t 294919 | f 58993 x 19731 | t 98719 | f 19731 x 56166 | t 280785 | f 56166 x 25865 | t 129344 | f 25865 x 85296 | t 426596 | f 85296 x 78463 | t 392182 | f 78463 x 30538 | t 152650 | f 30538 x 51691 | t 258571 | f 51691 training on 436627/459465 examples INFO:dask_generator:camera files 9 4296.05 ms X (64, 60, 3, 160, 320) angle (64, 60, 1) speed (64, 60, 1)
@chqsark thanks for the information. I'll continue investigating.
Suffering the same problem. Any thoughts?
I have solved the problem by changing the Keras version from 1.0.8 to 1.0.6.
I also solved it by completely removing keras and install the 1.0.6 version. Previously I tried virtualenv for 1.0.6 and it didn't work. Maybe my package system messed it up. Now it started running. Just the server side generates the following periodically.
Traceback (most recent call last): File "/home/guan.wang/ml/comma/research/dask_generator.py", line 109, in datagen X_batch[count] = x[i-es-time_len+1:i-es+1] File "/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 419, in getitem selection = sel.select(self.shape, args, dsid=self.id) File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 91, in select sel[args] File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 258, in getitem start, count, step, scalar = _handle_simple(self.shape,args) File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 509, in _handle_simple x,y,z = _translate_slice(arg, length) File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 550, in _translate_slice raise ValueError("Reverse-order selections are not allowed") ValueError: Reverse-order selections are not allowed
@EderSantana I also have the same situation. After install Keras 1.0.6 and start the training of transition, there is two kind of errors in the server side. One is the "ValueError: Reverse-order selections are not allowed" Traceback (most recent call last): File "/home/yale/research/dask_generator.py", line 109, in datagen X_batch[count] = x[i-es-time_len+1:i-es+1] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642) File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in getitem selection = sel.select(self.shape, args, dsid=self.id) File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 92, in select sel[args] File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 259, in getitem start, count, step, scalar = _handle_simple(self.shape,args) File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 443, in _handle_simple x,y,z = _translate_slice(arg, length) File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 484, in _translate_slice raise ValueError("Reverse-order selections are not allowed") ValueError: Reverse-order selections are not allowed
The other is the "could not broadcast input array from shape (5,1) into shape (60,1)" _Traceback (most recent call last): File "/home/yale/research/dask_generator.py", line 112, in datagen angle_batch[count] = np.copy(angle[i-timelen+1:i+1])[:, None] ValueError: could not broadcast input array from shape (5,1) into shape (60,1)
Is it related to the different Keras version?
@chqsark how did you completely remove keras ? Was it out of your virtualenv or under your virtualenv or conda environment ?
@EderSantana I got this issue
"/home/dev-box/anaconda2/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 246, in conv2d_shape padding)
File "/home/dev-box/anaconda2/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 184, in get2d_conv_output_size (row_stride, col_stride), padding_type)
File "/home/dev-box/anaconda2/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 149, in get_conv_output_size "Filter: %r Input: %r" % (filter_size, input_size))
ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (3, 160)
Keras 1.0.6 and TF 0.10, Theano 0.8.2
@andrewraharjo I'm seeing the same issue "ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (3, 160)" on an AWS GPU instance with TF 0.10 and Keras 1.1.0. I don't get the issue if running locally on a MacBook Pro with TF 0.9 and Keras 1.0.8. I'll try setting up the AWS instance with TF 0.9.
@jamesjackson It seems that it is related with current TF build.Checkout this link I haven't given up yet with TF but the way I start the training is using Theano backend 0.8.2, keras 1.1.0 with cuDNN 5.1 though it's recommended running cuDNN 5.0. If you can't keep going with TF then try to modify your keras.json to theano and update your theanorc file by changing CPU to GPU. Oh by the way I'm not running AWS, I have use stationary dev-box
guys, if you are using the new tensorflow and keras make sure to pass unroll=True as input parameters to RNN layers. I had this problem with other layers as well
Thanks @andrewraharjo , @EderSantana
It appears to be an odd environmental issue related to the packaging and/or Anaconda. I tried several TF/Keras versions, and they all failed in the same way. Building from source and avoiding Anaconda does work.
@jamesjackson I was thinking that way earlier and I verified with my buddy who installed use Anaconda3 and setup the virtualenv for Python 2.7. He could run with TF and I was confused why the Anaconda2 won't work. Did you solve this problem by building from source and avoid anaconda ?
As a note, my tensorflow was installed from source as well. (but I did use anaconda)
@andrewraharjo Yeah, source-based without Anaconda is working.
@jamesjackson Yes, source-based without Anaconda +1 @andrewraharjo Yes, I completely removed keras and reinstalled the right version.
I got the same erro when I am going to run the code to train the transition model:
The error in server side: Traceback (most recent call last): File "/home/sky/research/dask_generator.py", line 112, in datagen angle_batch[count] = np.copy(angle[i-time_len+1:i+1])[:, None] ValueError: could not broadcast input array from shape (5,1) into shape (60,1)
Does anyone know the solutions?
have you run the/view_generative_model.py transition --name transition successfull?
Traceback (most recent call last): File "/home/deep-learning/research-master/dask_generator.py", line 112, in datagen angle_batch[count] = np.copy(angle[i-time_len+1:i+1])[:, None] ValueError: could not broadcast input array from shape (55,1) into shape (60,1) same problem occured~
in the view steering model.py file I found his error (ValueError: bad marshal data (unknown type code)) result when trying to execute the view steering model.py here is the result from the cmd prompt
Traceback (most recent call last): File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py ", line 229, in func_load raw_code = codecs.decode(code.encode('ascii'), 'base64') UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 46: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "view_steering_model.py", line 94, in model = model_from_json(json.load(jfile)) File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\models.py", line 349, in model_from_json return layer_module.deserialize(config, custom_objects=custom_objects) File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\layersinit.py", l ine 55, in deserialize printable_module_name='layer') File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py ", line 144, in deserialize_keras_object list(custom_objects.items()))) File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\models.py", line 1349, in from_config layer = layer_module.deserialize(conf, custom_objects=custom_objects) File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\layersinit.py", l ine 55, in deserialize printable_module_name='layer') File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py ", line 144, in deserialize_keras_object list(custom_objects.items()))) File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\layers\core.py", line 711, in from_config function = func_load(config['function'], globs=globs) File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py ", line 234, in func_load code = marshal.loads(raw_code) ValueError: bad marshal data (unknown type code)
looks like the issue is from Keras, which version are you using?
Hi,
I trained the autoencoder successfully. However, when I was doing followup steps in training the transition model, I had the problem below.
Thanks a lot for any help!
BTW, I didn't change any code.