LSTMCell's zero_state causes tensorflow_to_barracuda error

Ranxi commented 4 years ago

Describe the bug I trained a TensorFlow model by myself. but right now, I want to transform it into a barracuda model, so it can run in Unity. However, I encountered a problem that seems to be a bug. It's an error when executing G:\Unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py line 476.

To Reproduce I have tried to reproduce it in example environments, but the thrown errors are not the same. So, I post the key snippet of my model here:

    with tf.variable_scope(self.scope):
        ...
        self._core_input = tf.reshape(tf.concat([self._stat_enc, self._entt_ebddings], axis=-1), [-1, 1, 128])
        ml_lstm = tf.nn.rnn_cell.BasicLSTMCell(self._core_hid_size)
        _init_state = ml_lstm.zero_state(self.batch_size, dtype=tf.float32) 
        // ml_lstm.get_initial_state(inputs=self._core_input, dtype=tf.float32)    // this way leads to same error
        self._core_outputs, self._core_states = tf.nn.dynamic_rnn(ml_lstm, inputs=self._core_input, initial_state=_init_state, time_major=False)

When I tried to convert the model to barracuda model, it threw this error:

Traceback (most recent call last):
  File ".\transform_mymdl2barracuda.py", line 19, in <module>
    main()
  File ".\transform_mymdl2barracuda.py", line 15, in main
    agt.export_model()
  File "G:\Unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\KZ_model.py", line 223, in export_model
    tf2bc.convert(frozen_graph_def_path, self.model_path + ".nn", verbose=True)
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 1553, in convert
    i_model, args
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 1381, in process_model
    nodes, var_tensors, const_tensors, o_context
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 476, in <lambda>
    int(by_name(tensors, "/axis").data[0]), context.layer_ranks[inputs[0]]
IndexError: list index out of range

I tried to reproduce it in the given example environments later, where I modified the function create_recurrent_encoder() in file ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\models.py as follows:

    ...
    with tf.variable_scope(name):
        rnn_cell = tf.contrib.rnn.BasicLSTMCell(half_point)
        # lstm_vector_in = tf.contrib.rnn.LSTMStateTuple(
        #     memory_in[:, :half_point], memory_in[:, half_point:]
        # )
        _init_state = rnn_cell.get_initial_state(inputs=lstm_input_state, dtype=tf.float32)
        recurrent_output, lstm_state_out = tf.nn.dynamic_rnn(
            rnn_cell, lstm_input_state, initial_state=_init_state
        )

It got this error:

Traceback (most recent call last):
  File "D:\ProgramData\Anaconda3\Scripts\mlagents-learn-script.py", line 11, in <module>
    load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')()
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\learn.py", line 408, in main
    run_training(0, run_seed, options, Queue())
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\learn.py", line 253, in run_training
    tc.start_learning(env)
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\trainer_controller.py", line 226, in start_learning
    self._export_graph()
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\trainer_controller.py", line 130, in _export_graph
    self.trainers[brain_name].export_model()
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\trainer.py", line 152, in export_model
    self.policy.export_model()
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tf_policy.py", line 230, in export_model
    tf2bc.convert(frozen_graph_def_path, self.model_path + ".nn")
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 1553, in convert
    i_model, args
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 1381, in process_model
    nodes, var_tensors, const_tensors, o_context
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 558, in <lambda>
    nodes, inputs, tensors, context, find_type="Reshape"
  File "g:\unity_projects\ml-agents\ml-agents-0.11.0\ml-agents\mlagents\trainers\tensorflow_to_barracuda.py", line 948, in basic_lstm
    assert len(inputs) == 2
AssertionError

Console logs / stack traces For my own transformation, I turned on the verbose of convert function in file tensorflow_to_barracuda.py and found:

...
PATTERN: agtbrain/concat ~~ ConcatV2 <- ['agtbrain/stat_enc/dense/Tanh', 'agtbrain/entt_enc/dense_1/Tanh'] + ['agtbrain/concat/axis']
         ['ConcatV2']
'agtbrain/concat' Concat Vars:['agtbrain/stat_enc/dense/Tanh', 'agtbrain/entt_enc/dense_1/Tanh'] Const:[]
PATTERN: agtbrain/Reshape ~~ Reshape <- ['agtbrain/concat'] + ['agtbrain/Reshape/shape']
         ['Reshape']
'agtbrain/Reshape' Reshape Vars:['agtbrain/concat'] Const:[]
PATTERN: agtbrain/BasicLSTMCellZeroState/concat ~~ ConcatV2 <- [] + ['agtbrain/BasicLSTMCellZeroState/Const', 'agtbrain/BasicLSTMCellZeroState/Const_1', 'agtbrain/BasicLSTMCellZeroState/concat/axis']
         ['ConcatV2']

It seems that var_tensors in tensorflow_to_barracuda.py: line 1381 should be:

['agtbrain/BasicLSTMCellZeroState/Const', 'agtbrain/BasicLSTMCellZeroState/Const_1']

but it was listed into const_tensors. How to solve the problem? If you need more information, I'll put it here ASAP.

Environment:

Windows10
ML-Agents v0.11.0
Tensorflow 1.15.0

mantasp commented 4 years ago

tensorflow_to_barracuda.py supports only unmodified models produced by ML-Agents. In the latest Barracuda versions we added experimental LSTM support via ONNX importer right in the Unity Editor. Please let us know if it works for you.

Ranxi commented 4 years ago

@mantasp thanks, the tf2onnx tool in https://github.com/onnx/tensorflow-onnx works perfectly.

Ranxi commented 4 years ago

@mantasp Unfortunately, although I successfully convert my tf model to onnx model, I found that neither the direct importing of my onnx file into Unity Project nor the onnx_to_barracuda script could make it work in Unity. The direct importing shows lots of errors: Unkown type encountered while parsing... and Unsupported attribute axis... and etc. The onnx_to_barracuda script ignores the SCAN layer which is common in LSTM Cells. Is there any chance to support the SCAN layer within not so far future? BTW, I also found some bugs when I tried to convert onnx to barracuda, that function get_tensor_data ( portal ) always treats the tensor.raw_data as float, while some raw_data are actually int64.

mozart20 commented 4 years ago

Hey @Ranxi, were you ever able to resolve this? I'm running into the exact same problem where keras to onnx conversion works perfectly, but when I load it into Unity, I get errors like "unsupported attribute axis Softmax", etc.

Ranxi commented 4 years ago

@mozart20 Hi, not yet. Up to now, I am still in the training stage, so I control the agents by the external communicators (provided by ML-Agents). For the future, I think that TensorFlowSharp may help: comment

Unity-Technologies / barracuda-release

LSTMCell's zero_state causes tensorflow_to_barracuda error #26