SciSharp / TensorFlow.NET

.NET Standard bindings for Google's TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
https://scisharp.github.io/tensorflow-net-docs
Apache License 2.0
3.23k stars 517 forks source link

[BUG Report + fix]: Earlystopping does not stop according to "_min_delta" and "_baseline" parameters #1178

Closed GaijinOtohp closed 1 year ago

GaijinOtohp commented 1 year ago

Description

  1. [BUG Report]: Earlystopping should triger stopping the training if loss value is not improved by at least _min_delta.
  2. [Suggestion]: I also suggest that the training should stop after loss reaches _base_line which should infer to the optimal value (the threshold to the desired loss value).

Reproduction Steps

            var layers = tf.keras.layers;

            Tensor input = tf.keras.Input((-1, 1), name: "input_place_holder");
            Tensor hiddenLayer1 = layers.Dense(7, activation: "linear").Apply(input);
            Tensor output = layers.Dense(2, activation: "linear").Apply(hiddenLayer1);
            var model = tf.keras.Model(input, output);

            Tensorflow.NumPy.NDArray inputs = new Tensorflow.NumPy.NDArray(new float[,] { { -5 }, { -4 }, { -3 }, { -2 }, { -1 }, { 0 }, { 1 }, { 2 }, { 3 }, { 4 }, { 5 } });
            Tensorflow.NumPy.NDArray outputs = new Tensorflow.NumPy.NDArray(new float[,] { { 0, 2 }, { 1, 4 }, { 2, 6 }, { 3, 8 }, { 4, 10 }, { 5, 12 }, { 6, 14 }, { 7, 16 }, { 8, 18 }, { 9, 20 }, { 10, 22 } });

            List<Tensorflow.NumPy.NDArray> weights = model.get_weights();

            var callbackPars = new Tensorflow.Keras.Callbacks.CallbackParams() { Model = model };
            var earlystoppingCB = new Tensorflow.Keras.Callbacks.EarlyStopping(parameters: callbackPars,
                                                                    monitor: "loss",
                                                                    patience: 3,
                                                                    mode: "min",
                                                                    restore_best_weights: true,
                                                                    //baseline: 0.001f,
                                                                    min_delta: 0.0001f);

            Tensorflow.Keras.Engine.ICallback historyCB;
            float learningRate = 0.1f;
            for (int i = 0; i < 100; i++)
            {
                model.compile(optimizer: tf.keras.optimizers.SGD(learningRate), loss: tf.keras.losses.MeanSquaredError());

                historyCB = model.fit(inputs, outputs, epochs: 1000, callbacks: new List<Tensorflow.Keras.Engine.ICallback>() { earlystoppingCB });

                if (historyCB.history["loss"][historyCB.history["loss"].Count - 1] < 0.5f)
                    break;

                model.set_weights(weights);
                learningRate /= 10f;
            }

            Tensor results0 = model.predict(tf.constant(new float[,] { { -8 } }));

Known Workarounds

Suggested fix to the bug

  1. The fix for the bug is in line 134 in the file Tensorflow.Keras.Callbacks.EarlyStopping.cs:
    bool less_op = (monitor_value - _min_delta) < reference_value;
    ---> bool less_op = (monitor_value + _min_delta) < reference_value;

    Why:

    since _min_delta is always positive, and monitor_value is a slight improvement of reference_value, then (monitor_value - _min_delta) < reference_value will always be true. Therefore, it keeps resetting _wait to 0.

The suggestion to earlystopping using _baseline

  1. The fix to the suggestion is in line 88 in the file Tensorflow.Keras.Callbacks.EarlyStopping.cs:
    if (_baseline == 0f || _is_improvement(current, _baseline))
    ---> if (_baseline == 0f || _is_improvement(_baseline, current))

    Why:

    We should check if the current improved loss is less than _baseline. If yes then do not reset _wait to 0.

Configuration and Other Information

No response

Wanglongzhi2001 commented 1 year ago

Hello, very thanks for your suggestion. In TensorFlow.NET, it uses a single bool expression to represent self.monitor_op rather than np.greater and np.less that used in tensorflow. And it seems like the reason trigers this bug is that there lacks of parts corresponding to the following code in tensorflow:

if self.monitor_op == np.greater:
    self.min_delta *= 1
else:
    self.min_delta *= -1

I will fix it, and you can refer to the implemention in tensorflow: https://github.com/tensorflow/tensorflow/blob/879a5e37eb350f556debec3cd5c9b6dcb8809318/tensorflow/python/keras/callbacks.py#L1702

Wanglongzhi2001 commented 1 year ago

Hello, I have commit a PR to solve this problem. #1180 If there still exist some problem, please tell me. ^_^

Wanglongzhi2001 commented 1 year ago

Can this issue be closed?

GaijinOtohp commented 1 year ago

Yes. The bug is fixed. It can be closed.