[BUG Report]: Unable to use a mask with a GRU layer

Utanapishtim31 commented 11 months ago

Description

In order to set a mask for a GRU layer, I must declare it with GRUOptionalArgs.Mask. But the class GRUOptionalArgs does not implement the interface IOptionalArgs (probably an omission...), so it cannot be passed to GRU.Call() !

Please note that GRU.Call() checks for a GRUOptionalArgs and does not accept an RnnOptionalArgs.

Reproduction Steps

Try to compile the following code:

GRUArgs gruArgs = new GRUArgs(); gruArgs.Units = 100; GRU rnnLayer = new GRU(gruArgs);

GRUOptionalArgs rnnOptionalArgs = new GRUOptionalArgs(); Tensors inputs = np.zeros(new Shape(1, 10, 5))

rnnLayer.Apply(inputs, optional_args: rnnOptionalArgs);

Known Workarounds

Use an LSTM ???

Configuration and Other Information

Tensorflow.NET 0.110.4 Tensorflow.Keras 0.11.4 .Net Framework 4.7.2 Windows 11

Wanglongzhi2001 commented 11 months ago

Hello, you can use RnnOptionalArgs.Mask as a Workaround, the following code provide an example:

using static Tensorflow.KerasApi;
using Tensorflow;
using Tensorflow.NumPy;
using Tensorflow.Keras.ArgsDefinition;

var layers = keras.layers;
var rnnLayer = layers.GRU(units:100);
Tensors inputs = np.zeros(new Shape(1, 10, 5), dtype:TF_DataType.TF_FLOAT);

var res = rnnLayer.Apply(inputs, optional_args:new RnnOptionalArgs { Mask=inputs});
Console.WriteLine(res.ToString());

Wanglongzhi2001 commented 11 months ago

And thank you for the issue, I'll fix it. ^_^

Utanapishtim31 commented 11 months ago

With the following code, I get a KeyNotFound exception in predict():

var inputs = keras.Input((4, 2));
var inputs_mask = keras.Input((4, 1), dtype: TF_DataType.TF_BOOL);

RnnOptionalArgs rnnOptionalArgs = new RnnOptionalArgs();
rnnOptionalArgs.Mask = inputs_mask;

var rnn = keras.layers.GRU(10).Apply(inputs, optional_args: rnnOptionalArgs);

var output = keras.layers.Dense(2, activation: "softmax").Apply(rnn);

var model = keras.Model((inputs, inputs_mask), output);
model.summary();

NDArray x1 = np.random.random((1, 4, 2)).astype(TF_DataType.TF_FLOAT);
NDArray x2 = np.ones((1, 4, 1), TF_DataType.TF_BOOL);
var pred = model.predict((x1, x2));
Console.WriteLine(pred);

It looks like the graph structure does not detect that _inputsmask is somehow connected to the RNN: _Functional.tensor_usagecount does not include the tensor _inputsmask.

It works fine when mask = inputs probably because inputs is well-connected to the GRU.

Wanglongzhi2001 commented 11 months ago

Hello, the code you provided seems has some problem, I trans it into tensorflow python in the following, but it can not run successfully.

import tensorflow as tf
import tensorflow.keras as keras

inputs = keras.Input(shape=(4, 2))
inputs_mask = keras.Input(shape=(4, 1), dtype=tf.bool)

rnn = keras.layers.GRU(10)(inputs, mask=inputs_mask)
output = keras.layers.Dense(2, activation="softmax")(rnn)
model = keras.Model(inputs=[inputs, inputs_mask], outputs=output)
model.summary()

Utanapishtim31 commented 11 months ago

Hi,

My mistake. The mask input should have one dimension less:

inputs_mask = keras.Input(shape=(4,), dtype=tf.bool)

Utanapishtim31 commented 11 months ago

In C#, you can keep the same code with the following change:

var inputs_mask = keras.Input(new Shape(4), dtype: TF_DataType.TF_BOOL);

Then you get an exception "NotSupportedException (The collection has a fixed size)".

Wanglongzhi2001 commented 11 months ago

You are right, this is a bug.

Utanapishtim31 commented 11 months ago

As I explained above, the true origin of the problem is that the graph structure does not memorize the fact that _inputsmask is actually an input to the RNN. As a result, it is pruned and it fails later when it has to be used.

To confirm this, I have artificially connected _inputsmask to a second "dummy" dense layer whose output I connect with an Add layer to the output (because I cannot fit a multi-output model - that's another point). Then the model works fine with an LSTM recurrent layer.

With a GRU recurrent layer (as in the code here), there is yet another problem afterwards when trying to fit the model with an exception telling that a Tensorflow primitive "SplitV" is missing. I let you analyze this...

Utanapishtim31 commented 11 months ago

Sample code for the exception System.Collections.Generic.KeyNotFoundException during the training with a masked GRU:

Create a layer that always outputs zero (so that adding its output to the model output won't change the result). It is used only to create a dummy link between inputs_mask and the model output, so that inputs_mask is not pruned.

internal class ZeroLayer : Layer
{
    private Shape output_shape;

    public ZeroLayer(Shape output_shape, string name = null)
        : base(new LayerArgs { Name = name })
    {
        this.output_shape = output_shape;
    }

    protected override Tensors Call(Tensors inputs, Tensors state = null, bool? training = null, IOptionalArgs optional_args = null)
    {
        return tf.zeros(this.output_shape, dtype: TF_DataType.TF_FLOAT);
    }

    public override Shape ComputeOutputShape(Shape input_shape)
    {
        return this.output_shape;
    }
}

Then fit the model:

var inputs = keras.Input((4, 2));
var inputs_mask = keras.Input(new Shape(4), dtype: TF_DataType.TF_BOOL);

RnnOptionalArgs rnnOptionalArgs = new RnnOptionalArgs();
rnnOptionalArgs.Mask = inputs_mask;

var rnn = keras.layers.LSTM(10).Apply(inputs, optional_args: rnnOptionalArgs);

var x = keras.layers.Dense(2, activation: "softmax").Apply(rnn);
var y = new ZeroLayer(new Shape(2)).Apply(inputs_mask);

var output = keras.layers.Add().Apply(new Tensors(x, y));

var model = keras.Model((inputs, inputs_mask), output);
model.summary();

NDArray x1 = np.random.random((1, 4, 2)).astype(TF_DataType.TF_FLOAT);
NDArray x2 = np.ones((1, 4), TF_DataType.TF_BOOL);
var pred = model.predict((x1, x2));
Console.WriteLine(pred);

NDArray[] train_inputs = new NDArray[2] { x1, x2 };
NDArray train_y = np.zeros(new Shape(1, 2), dtype: TF_DataType.TF_FLOAT);
train_y[0, 0] = 1.0f;

model.compile("adam", "categorical_crossentropy", new string[] { "accuracy" });
model.fit(train_inputs, train_y, batch_size: 1, epochs: 1);

With an LSTM rnn, model.fit() works fine. Replace it with a GRU rnn and you get a System.Collections.Generic.KeyNotFoundException. Fitting the model requires a Tensorflow primitive SplitV which is not in a dictionary.

Wanglongzhi2001 commented 11 months ago

Very thanks for your example, I will let you know once this bug be fixed.

SciSharp / TensorFlow.NET