bottler / iisignature

Iterated integral signature calculations
MIT License
96 stars 18 forks source link

TF error running demo_rnn.py #2

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hi Jeremy,

thanks very much for sharing this repository - it's very exciting!

I just tried to run demo_rnn.py, and got a big blob of an error (and a similar error when I try to run demo_keras.py).

It definitely is something to do with the TF graph, as it makes it to line 57,

(py35_pytorch) ajay@ajay-h8-1170uk:~/PythonProjects/iisignature-master/examples$ python demo_rnn.py
Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
recurrent_sig_1 (RecurrentSi (None, 5)                 196.0     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 6         
=================================================================
Total params: 202
Trainable params: 202
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2017-08-19 08:23:29.916705: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-19 08:23:29.916758: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-19 08:23:29.916768: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 82, in __call__
    ret = func(*args)
  File "/home/ajay/PythonProjects/iisignature-master/examples/iisignature_tensorflow.py", line 28, in _sigJoinGradFixedImp
    return o[0],o[1],_zero,np.array(o[2],dtype="float32")
IndexError: tuple index out of range
2017-08-19 08:23:31.902466: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_2: see error log.
2017-08-19 08:23:31.905790: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_2: see error log.
     [[Node: gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _class=["loc:@recurrent_sig_1/while/SigJoin"], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/recurrent_sig_1/while/Reshape_2_grad/Reshape, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_1, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_2, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/Enter)]]
Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
    return fn(*args)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
    status, run_metadata)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_2: see error log.
     [[Node: gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _class=["loc:@recurrent_sig_1/while/SigJoin"], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/recurrent_sig_1/while/Reshape_2_grad/Reshape, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_1, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_2, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/Enter)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  **File "demo_rnn.py", line 57, in <module>
    m.fit(x,y,epochs=10,shuffle=0)**
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/models.py", line 863, in fit
    initial_epoch=initial_epoch)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/engine/training.py", line 1430, in fit
    initial_epoch=initial_epoch)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/engine/training.py", line 1079, in _fit_loop
    outs = f(ins_batch)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2268, in __call__
    **self.session_kwargs)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 778, in run
    run_metadata_ptr)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_2: see error log.
     [[Node: gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _class=["loc:@recurrent_sig_1/while/SigJoin"], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/recurrent_sig_1/while/Reshape_2_grad/Reshape, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_1, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_2, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/Enter)]]

Caused by op 'gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed', defined at:
  File "demo_rnn.py", line 57, in <module>
    m.fit(x,y,epochs=10,shuffle=0)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/models.py", line 863, in fit
    initial_epoch=initial_epoch)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/engine/training.py", line 1413, in fit
    self._make_train_function()
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/engine/training.py", line 937, in _make_train_function
    self.total_loss)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/optimizers.py", line 404, in get_updates
    grads = self.get_gradients(loss, params)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/optimizers.py", line 71, in get_gradients
    grads = K.gradients(loss, params)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2305, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 368, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/ajay/PythonProjects/iisignature-master/examples/iisignature_tensorflow.py", line 44, in _sigJoinGradFixed
    [tf.float32]*4, name="SigJoinGradFixed")
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
    name=name)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

...which was originally created as op 'recurrent_sig_1/while/SigJoin', defined at:
  File "demo_rnn.py", line 29, in <module>
    m.add(RecurrentSig(5,sig_level=2,input_shape=(None,3),return_sequences=False, use_signatures = True, output_signatures = False, activation="tanh",train_time_lapse=True))
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/models.py", line 436, in add
    layer(x)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/layers/recurrent.py", line 262, in __call__
    return super(Recurrent, self).__call__(inputs, **kwargs)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/engine/topology.py", line 596, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/layers/recurrent.py", line 341, in call
    input_length=input_shape[1])
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2536, in rnn
    swap_memory=True)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2623, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2456, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2406, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2525, in _step
    tuple(constants))
  File "/home/ajay/PythonProjects/iisignature-master/examples/iisignature_recurrent_keras.py", line 102, in step
    sigs = SigJoin(prev_sigs_,displacements,self.sig_level,self.time_lapse)
  File "/home/ajay/PythonProjects/iisignature-master/examples/iisignature_tensorflow.py", line 82, in SigJoin
    return tf.py_func(_sigJoinFixedImp, [x,y,m,fixedLast], tf.float32, name="SigJoin")
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
    name=name)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)

InternalError (see above for traceback): Failed to run py callback pyfunc_2: see error log.
     [[Node: gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _class=["loc:@recurrent_sig_1/while/SigJoin"], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/recurrent_sig_1/while/Reshape_2_grad/Reshape, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_1, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/StackPop_2, gradients/recurrent_sig_1/while/SigJoin_grad/SigJoinGradFixed/Enter)]]

Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f60f6552198>>
Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 587, in __del__
TypeError: 'NoneType' object is not callable

PS - I normally use PyTorch, and I was wondering how difficult it would be to write a signature RNN module for it?

Thanks for your help, Aj

bottler commented 7 years ago

The important bit in that stuff is the "IndexError: tuple index out of range" bit. I think you don't have the latest iisignature (0.20, released 9 August 2017), which means you can't get derivatives with respect to the time delay thingy. Solutions: either set train_time_lapse=False when you create the RecurrentSig layer in demo_rnn.py or upgrade to the latest version with pip install --upgrade iisignature.

ghost commented 7 years ago

Hi Jeremy,

sorry for not including the latest version number in this issue. Yep, my bad with the latest version 0.20, installed with, pip install --upgrade iisignature. It's all good to to go.

(py35_pytorch) ajay@ajay-h8-1170uk:~/PythonProjects/iisignature-master/examples$ python demo_rnn.py
Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
recurrent_sig_1 (RecurrentSi (None, 5)                 196.0     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 6         
=================================================================
Total params: 202
Trainable params: 202
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2017-08-20 14:27:10.723206: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-20 14:27:10.723262: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-20 14:27:10.723272: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2000/2000 [==============================] - 0s - loss: 0.0510      
Epoch 2/10
2000/2000 [==============================] - 0s - loss: 0.0461     
Epoch 3/10
2000/2000 [==============================] - 0s - loss: 0.0440     
Epoch 4/10
2000/2000 [==============================] - 0s - loss: 0.0426     
Epoch 5/10
2000/2000 [==============================] - 0s - loss: 0.0415     
Epoch 6/10
2000/2000 [==============================] - 0s - loss: 0.0404     
Epoch 7/10
2000/2000 [==============================] - 0s - loss: 0.0393     
Epoch 8/10
2000/2000 [==============================] - 0s - loss: 0.0380     
Epoch 9/10
2000/2000 [==============================] - 0s - loss: 0.0366     
Epoch 10/10
2000/2000 [==============================] - 0s - loss: 0.0348     
0.0333910922002

Thank you very much for the help!

bottler commented 7 years ago

I haven't seen much pytorch code, so I don't know the elegant way to write stuff, but I have just checked in an approximate pytorch equivalent of the keras recurrent example.

I don't claim that the recurrent structure which these examples show is a good performing recurrent layer for any real problems. It was just a simple idea. I am interested in general in ideas for improving RNNs using signatures, and have other ideas not released, and am happy to chat about this type of research. This example code is just the first idea I coded up.

ghost commented 7 years ago

WOW - how did you do that so quickly !!!

(py35_pytorch) ajay@ajay-h8-1170uk:~/PythonProjects/iisignature-master/examples$ python demo_rnn_torch.py

 0.3396
[torch.FloatTensor of size 1]

Looks good to me :+1:

May I email you to chat about your interests regarding applications of RNNs with the signature method? What's your preferred email or method of communication - please feel free to drop me a line ajaytalati@googlemail.com, if you don't want to post it here. Happy to meet up if you're in London still?

At the moment I'm working on a GP+RNN pipeline, so I'm looking at alternatives to GP modelling, or enhancements to it? In particular data efficiency, unsupervised learning and adversarial training are the things that are relevant. Just wondered what you initial thoughts were on this?

Thanks a lot, best regards,

Ajay