SciSharp / TensorFlow.NET

.NET Standard bindings for Google's TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
https://scisharp.github.io/tensorflow-net-docs
Apache License 2.0
3.24k stars 521 forks source link

Contribution Port of tensorflow.contrib.layers.variance_scaling_initializer #511

Closed tcwicks closed 3 years ago

tcwicks commented 4 years ago

I would like to contribute the following. However: 1) I'm new to Github 2) I have no idea where it would go in SciSharp Tensorflow. 3) I'm not part of the project so I have no access anyway.

This is a port of tensorflow.contrib.variance_scaling_initializer trying to keep it as close as possible to the original in terms of style. Also along with the HE variant and the Xavier variant convenience methods.

Python URL: https://github.com/agrawalnishant/tensorflow/blob/da0a62b8c3d9e3357d41b5354acad3b5b25f7f95/tensorflow/contrib/layers/python/layers/initializers.py

Reason / Motivation: SciSharp Tensorflow does have an implementation of GlorotUniform however the variance scaling intializer has other permutations as well. Depending on the use case these can have a significant impact on training time.

`

public static class VarianceScalingInitializer
{
    public enum E_VarianceInitMode
    {
        FAN_IN = 0,
        FAN_AVG = 1,
        FAN_OUT = 2,
    }
    public enum E_GlorotVariant
    {
        TruncatedNormal = 0,
        Xavier = 1,
        HE = 2,
    }
    public static IInitializer GlorotVariant(int[] _Shape, E_GlorotVariant Variant = E_GlorotVariant.TruncatedNormal, TF_DataType _DType = TF_DataType.TF_FLOAT)
    {
        //return tf.truncated_normal_initializer(dtype: TF_DataType.TF_FLOAT);
        switch (Variant)
        {
            case E_GlorotVariant.TruncatedNormal:
                return tf.truncated_normal_initializer(dtype: _DType);
            case E_GlorotVariant.Xavier:
                return XavierInitializer(_Shape, dtype: _DType);
            case E_GlorotVariant.HE:
                return HEInitializer(_Shape, dtype: _DType);
            default:
                return XavierInitializer(_Shape, dtype: _DType);
        }
    }
    /// <summary>
    /// <para>Xavier Variant</para>
    /// <para>This function implements the weight initialization from:</para>
    /// <para>Xavier Glorot and Yoshua Bengio (2010):</para>
    /// <para>[Understanding the difficulty of training deep feedforward neural
    /// networks. International conference on artificial intelligence and  statistics.]</para>
    /// <para>See: http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf </para>
    /// </summary>
    /// <remarks>Normal distribution usually works better than uniform so default changed to false</remarks>
    /// <param name="uniform">Whether to use uniform or normal distributed random initialization.</param>
    /// <param name="seed">>Random Seed</param>
    /// <param name="dtype">Tensorflow data type. Only floating point types are supported</param>
    /// <returns>An initializer for a weight matrix.</returns>
    public static IInitializer XavierInitializer(int[] _Shape, bool uniform = false, int? seed = null, TF_DataType dtype = TF_DataType.TF_FLOAT)
    {
        return VarianceScalingInitializer(_Shape, _Factor: 1.0f, _Mode: E_VarianceInitMode.FAN_AVG, _Seed: seed, _DType: dtype);
    }

    /// <summary>
    /// <para>HE Variant</para>
    /// <para>This function implements the weight initialization from:</para>
    /// <para>Xavier Glorot and Yoshua Bengio (2010):</para>
    /// <para>[Understanding the difficulty of training deep feedforward neural
    /// networks. International conference on artificial intelligence and  statistics.]</para>
    /// <para>See: http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf </para>
    /// </summary>
    /// <remarks>Normal distribution usually works better than uniform so default changed to false</remarks>
    /// <param name="uniform">Whether to use uniform or normal distributed random initialization.</param>
    /// <param name="seed">>Random Seed</param>
    /// <param name="dtype">Tensorflow data type. Only floating point types are supported</param>
    /// <returns>An initializer for a weight matrix.</returns>
    public static IInitializer HEInitializer(int[] _Shape, bool uniform = false, int? seed = null, TF_DataType dtype = TF_DataType.TF_FLOAT)
    {
        return VarianceScalingInitializer(_Shape, _Factor: 2.0f, _Mode: E_VarianceInitMode.FAN_AVG, _Seed: seed, _DType: dtype);
    }

    /// <summary>
    /// <para>Returns an initializer that generates tensors without scaling variance.</para>
    /// <para>When initializing a deep network, it is in principle advantageous to keep
    /// the scale of the input variance constant, so it does not explode or diminish
    /// by reaching the final layer. This initializer use the following formula:</para>
    /// 
    /// <para>* To get [Delving Deep into Rectifiers](http://arxiv.org/pdf/1502.01852v1.pdf) (also know as the "MSRA initialization")</para>
    /// <para>Use (Default): factor=2.0, mode=FAN_IN, uniform=false</para>
    /// <para>* To get [Convolutional Architecture for Fast Feature Embedding](http://arxiv.org/abs/1408.5093)</para>
    /// <para>Use: factor=1.0, mode=FAN_IN, uniform=true</para>
    /// <para>To get [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)</para>
    /// <para>use: factor=1.0, mode=FAN_AVG, uniform=true</para>
    /// <para>To get 'xavier_initializer' use either:</para>
    /// <para>factor=1.0, mode=FAN_AVG, uniform=true'</para>
    /// <para>factor=1.0, mode=FAN_AVG, uniform=frue'</para>
    /// </summary>
    /// <param name="_Factor">A multiplicative factor.</param>
    /// <param name="_Mode">
    /// if mode == <see cref="E_VarianceInitMode.FAN_IN"/>
    ///     n = fan_in
    /// else if mode == <see cref="E_VarianceInitMode.FAN_OUT"/>
    ///     n = fan_out
    /// else if mode == <see cref="E_VarianceInitMode.FAN_AVG"/>
    ///     n = (fan_in + fan_out)/2.0
    /// </param>
    /// <param name="_Uniform">
    /// If Uniform
    ///     truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
    /// else
    ///     limit = (float)Math.Sqrt(3.0 * factor / n);
    ///     truncated_normal(shape, -limit, limit)
    /// </param>
    /// <param name="_Seed">Random Seed</param>
    /// <param name="_DType">Tensorflow data type. Only floating point types are supported</param>
    /// <returns>An initializer that generates tensors with unit variance.</returns>
    public static IInitializer VarianceScalingInitializer(int[] _Shape, float _Factor = 2.0f, E_VarianceInitMode _Mode = E_VarianceInitMode.FAN_IN,
                             int? _Seed = null, TF_DataType _DType = TF_DataType.TF_FLOAT, string _Name = null)
    {
        if (!_DType.is_floating()) { throw new InvalidOperationException(@"Cannot create variance scaling initializer for non-floating point type."); }

        float fan_in;
        float fan_out;
        float n;
        float limit;

        _Name = _Name.ConcatIfNotNullOrEmptyElseNull(@"_VarianceInit");

        if (_Shape != null)
        {
            int ShapeSize;
            ShapeSize = _Shape.Length;
            if (_Shape.Length > 1)
            {
                fan_in = _Shape[ShapeSize - 2];
            }
            else
            {
                fan_in = _Shape[ShapeSize - 1];
            }
            fan_out = _Shape[ShapeSize - 1];
            ShapeSize -= 2;
            for (int I = 0; I < ShapeSize; I++)
            {
                fan_in *= ((float)I);
                fan_out *= ((float)I);
            }
        }
        else
        {
            fan_in = 1.0f;
            fan_out = 1.0f;
        }
        switch (_Mode)
        {
            case E_VarianceInitMode.FAN_IN:
                n = fan_in;
                break;
            case E_VarianceInitMode.FAN_OUT:
                n = fan_out;
                break;
            case E_VarianceInitMode.FAN_AVG:
                n = (fan_in + fan_out) / 2.0f;
                break;
            default:
                n = fan_in;
                break;
        }
        float trunc_stddev = (float)Math.Sqrt(1.3 * _Factor / n);
        return tf.truncated_normal_initializer(0.0f, trunc_stddev, seed: _Seed, dtype: _DType);
    }

}

`

Oceania2018 commented 4 years ago

Is that what you want ? image

Can you contact me in Gitter if you don't know how to participate this project?

tcwicks commented 4 years ago

@Oceania2018 I've seen GlorotUniform however there are a few differences. Variance Scaling Initializer has both the uniform as well as non uniform permutation. And each of these has the Xavier and the HE permutation In Math.Sqrt(1.3 * _Factor / n); Xavier uses a Factor of 1 HE uses a factor of

Also in many models the non uniform Glorot initializer performs better than GlorotUniform.

This person has done a decent job of explaining it much better than I can.

https://adventuresinmachinelearning.com/weight-initialization-tutorial-tensorflow/

Oceania2018 commented 3 years ago

Can you try latest version?