Contribution Port of tensorflow.contrib.layers.variance_scaling_initializer

tcwicks commented 4 years ago

I would like to contribute the following. However: 1) I'm new to Github 2) I have no idea where it would go in SciSharp Tensorflow. 3) I'm not part of the project so I have no access anyway.

This is a port of tensorflow.contrib.variance_scaling_initializer trying to keep it as close as possible to the original in terms of style. Also along with the HE variant and the Xavier variant convenience methods.

Python URL: https://github.com/agrawalnishant/tensorflow/blob/da0a62b8c3d9e3357d41b5354acad3b5b25f7f95/tensorflow/contrib/layers/python/layers/initializers.py

Reason / Motivation: SciSharp Tensorflow does have an implementation of GlorotUniform however the variance scaling intializer has other permutations as well. Depending on the use case these can have a significant impact on training time.

`

public static class VarianceScalingInitializer
{
    public enum E_VarianceInitMode
    {
        FAN_IN = 0,
        FAN_AVG = 1,
        FAN_OUT = 2,
    }
    public enum E_GlorotVariant
    {
        TruncatedNormal = 0,
        Xavier = 1,
        HE = 2,
    }
    public static IInitializer GlorotVariant(int[] _Shape, E_GlorotVariant Variant = E_GlorotVariant.TruncatedNormal, TF_DataType _DType = TF_DataType.TF_FLOAT)
    {
        //return tf.truncated_normal_initializer(dtype: TF_DataType.TF_FLOAT);
        switch (Variant)
        {
            case E_GlorotVariant.TruncatedNormal:
                return tf.truncated_normal_initializer(dtype: _DType);
            case E_GlorotVariant.Xavier:
                return XavierInitializer(_Shape, dtype: _DType);
            case E_GlorotVariant.HE:
                return HEInitializer(_Shape, dtype: _DType);
            default:
                return XavierInitializer(_Shape, dtype: _DType);
        }
    }
    /// <summary>
    /// <para>Xavier Variant</para>
    /// <para>This function implements the weight initialization from:</para>
    /// <para>Xavier Glorot and Yoshua Bengio (2010):</para>
    /// <para>[Understanding the difficulty of training deep feedforward neural
    /// networks. International conference on artificial intelligence and  statistics.]</para>
    /// <para>See: http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf </para>
    /// </summary>
    /// <remarks>Normal distribution usually works better than uniform so default changed to false</remarks>
    /// <param name="uniform">Whether to use uniform or normal distributed random initialization.</param>
    /// <param name="seed">>Random Seed</param>
    /// <param name="dtype">Tensorflow data type. Only floating point types are supported</param>
    /// <returns>An initializer for a weight matrix.</returns>
    public static IInitializer XavierInitializer(int[] _Shape, bool uniform = false, int? seed = null, TF_DataType dtype = TF_DataType.TF_FLOAT)
    {
        return VarianceScalingInitializer(_Shape, _Factor: 1.0f, _Mode: E_VarianceInitMode.FAN_AVG, _Seed: seed, _DType: dtype);
    }

    /// <summary>
    /// <para>HE Variant</para>
    /// <para>This function implements the weight initialization from:</para>
    /// <para>Xavier Glorot and Yoshua Bengio (2010):</para>
    /// <para>[Understanding the difficulty of training deep feedforward neural
    /// networks. International conference on artificial intelligence and  statistics.]</para>
    /// <para>See: http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf </para>
    /// </summary>
    /// <remarks>Normal distribution usually works better than uniform so default changed to false</remarks>
    /// <param name="uniform">Whether to use uniform or normal distributed random initialization.</param>
    /// <param name="seed">>Random Seed</param>
    /// <param name="dtype">Tensorflow data type. Only floating point types are supported</param>
    /// <returns>An initializer for a weight matrix.</returns>
    public static IInitializer HEInitializer(int[] _Shape, bool uniform = false, int? seed = null, TF_DataType dtype = TF_DataType.TF_FLOAT)
    {
        return VarianceScalingInitializer(_Shape, _Factor: 2.0f, _Mode: E_VarianceInitMode.FAN_AVG, _Seed: seed, _DType: dtype);
    }

    /// <summary>
    /// <para>Returns an initializer that generates tensors without scaling variance.</para>
    /// <para>When initializing a deep network, it is in principle advantageous to keep
    /// the scale of the input variance constant, so it does not explode or diminish
    /// by reaching the final layer. This initializer use the following formula:</para>
    /// 
    /// <para>* To get [Delving Deep into Rectifiers](http://arxiv.org/pdf/1502.01852v1.pdf) (also know as the "MSRA initialization")</para>
    /// <para>Use (Default): factor=2.0, mode=FAN_IN, uniform=false</para>
    /// <para>* To get [Convolutional Architecture for Fast Feature Embedding](http://arxiv.org/abs/1408.5093)</para>
    /// <para>Use: factor=1.0, mode=FAN_IN, uniform=true</para>
    /// <para>To get [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)</para>
    /// <para>use: factor=1.0, mode=FAN_AVG, uniform=true</para>
    /// <para>To get 'xavier_initializer' use either:</para>
    /// <para>factor=1.0, mode=FAN_AVG, uniform=true'</para>
    /// <para>factor=1.0, mode=FAN_AVG, uniform=frue'</para>
    /// </summary>
    /// <param name="_Factor">A multiplicative factor.</param>
    /// <param name="_Mode">
    /// if mode == <see cref="E_VarianceInitMode.FAN_IN"/>
    ///     n = fan_in
    /// else if mode == <see cref="E_VarianceInitMode.FAN_OUT"/>
    ///     n = fan_out
    /// else if mode == <see cref="E_VarianceInitMode.FAN_AVG"/>
    ///     n = (fan_in + fan_out)/2.0
    /// </param>
    /// <param name="_Uniform">
    /// If Uniform
    ///     truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
    /// else
    ///     limit = (float)Math.Sqrt(3.0 * factor / n);
    ///     truncated_normal(shape, -limit, limit)
    /// </param>
    /// <param name="_Seed">Random Seed</param>
    /// <param name="_DType">Tensorflow data type. Only floating point types are supported</param>
    /// <returns>An initializer that generates tensors with unit variance.</returns>
    public static IInitializer VarianceScalingInitializer(int[] _Shape, float _Factor = 2.0f, E_VarianceInitMode _Mode = E_VarianceInitMode.FAN_IN,
                             int? _Seed = null, TF_DataType _DType = TF_DataType.TF_FLOAT, string _Name = null)
    {
        if (!_DType.is_floating()) { throw new InvalidOperationException(@"Cannot create variance scaling initializer for non-floating point type."); }

        float fan_in;
        float fan_out;
        float n;
        float limit;

        _Name = _Name.ConcatIfNotNullOrEmptyElseNull(@"_VarianceInit");

        if (_Shape != null)
        {
            int ShapeSize;
            ShapeSize = _Shape.Length;
            if (_Shape.Length > 1)
            {
                fan_in = _Shape[ShapeSize - 2];
            }
            else
            {
                fan_in = _Shape[ShapeSize - 1];
            }
            fan_out = _Shape[ShapeSize - 1];
            ShapeSize -= 2;
            for (int I = 0; I < ShapeSize; I++)
            {
                fan_in *= ((float)I);
                fan_out *= ((float)I);
            }
        }
        else
        {
            fan_in = 1.0f;
            fan_out = 1.0f;
        }
        switch (_Mode)
        {
            case E_VarianceInitMode.FAN_IN:
                n = fan_in;
                break;
            case E_VarianceInitMode.FAN_OUT:
                n = fan_out;
                break;
            case E_VarianceInitMode.FAN_AVG:
                n = (fan_in + fan_out) / 2.0f;
                break;
            default:
                n = fan_in;
                break;
        }
        float trunc_stddev = (float)Math.Sqrt(1.3 * _Factor / n);
        return tf.truncated_normal_initializer(0.0f, trunc_stddev, seed: _Seed, dtype: _DType);
    }

}

`

Oceania2018 commented 4 years ago

Is that what you want ?

Can you contact me in Gitter if you don't know how to participate this project?

tcwicks commented 4 years ago

@Oceania2018 I've seen GlorotUniform however there are a few differences. Variance Scaling Initializer has both the uniform as well as non uniform permutation. And each of these has the Xavier and the HE permutation In Math.Sqrt(1.3 * _Factor / n); Xavier uses a Factor of 1 HE uses a factor of

Also in many models the non uniform Glorot initializer performs better than GlorotUniform.

Xavier Glorot Uniform
Xavier Glorot Non Uniform
HE Glorot Uniform
HE Glorot Non Uniform

This person has done a decent job of explaining it much better than I can.

https://adventuresinmachinelearning.com/weight-initialization-tutorial-tensorflow/

Oceania2018 commented 3 years ago

Can you try latest version?

SciSharp / TensorFlow.NET

Contribution Port of tensorflow.contrib.layers.variance_scaling_initializer #511