LeelaChessZero / lczero-training

For code etc relating to the network training process.
143 stars 119 forks source link

VarianceScaling Initializer Is Unseeded #218

Open cn4750 opened 1 year ago

cn4750 commented 1 year ago

@Tilps ran into this issue after upgrading TensorFlow:

/usr/local/lib/python3.8/dist-packages/keras/initializers/initializers.py:120: UserWarning: The initializer VarianceScaling is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.

From Tilps:

I think its a bug in our code - I think you are supposed to construct a separate instance for every use case, so that 'call' is only invoked at most once.

I assume it is this code the above warning references.

@masterkni6, can you look at this when you get a chance?

Tilps commented 1 year ago

fix would be to change 'initializer' to be a function - which constructs a VarianceScaling with appropriate parameters - so each FFN gets its own instance rather than sharing the same.

masterkni6 commented 1 year ago

creating two instances of the initializer does not get rid of the warning. It seems a seed is required to get rid of the warning even if seed is hardcoded to 0

Tilps commented 1 year ago

hmm - maybe there is more than one broken place?

Tilps commented 1 year ago
@@ -1347,6 +1357,7 @@ class TFProcess:
     def encoder_layer(self, inputs, emb_size: int, d_model: int,
                       num_heads: int, dff: int, name: str):
         initializer = None
+        initializer2 = None
         if self.encoder_layers > 0:
             # DeepNorm
             alpha = tf.cast(tf.math.pow(2. * self.encoder_layers, 0.25),
@@ -1355,10 +1366,14 @@ class TFProcess:
                            self.model_dtype)
             xavier_norm = tf.keras.initializers.VarianceScaling(
                 scale=beta, mode='fan_avg', distribution='truncated_normal')
+            xavier_norm2 = tf.keras.initializers.VarianceScaling(
+                scale=beta, mode='fan_avg', distribution='truncated_normal')
             initializer = xavier_norm
+            initializer2 = xavier_norm2
         else:
             alpha = 1
             initializer = "glorot_normal"
+            initializer2 = "glorot_normal"
         # multihead attention
         attn_output, attn_wts = self.mha(inputs,
                                          emb_size,
@@ -1377,7 +1392,7 @@ class TFProcess:
         ffn_output = self.ffn(out1,
                               emb_size,
                               dff,
-                              initializer,
+                              initializer2,
                               name=name + "/ffn")
         ffn_output = tf.keras.layers.Dropout(self.dropout_rate,
                                              name=name +

I'm giving this a test now...

masterkni6 commented 1 year ago
def _warn_reuse(self):
        if getattr(self, "_used", False):
            if getattr(self, "seed", None) is None:
                warnings.warn(
                    f"The initializer {self.__class__.__name__} is unseeded "
                    "and being called multiple times, which will return "
                    "identical values each time (even if the initializer is "
                    "unseeded). Please update your code to provide a seed to "
                    "the initializer, or avoid using the same initalizer "
                    "instance more than once."
                )
        else:
            self._used = True

seems the source code is only checking for seed is None

Tilps commented 1 year ago

yes - because they assume if you are setting the seed you know what you are doing...

Tilps commented 1 year ago

I also see no improvement with the fix...

Tilps commented 1 year ago

oh ffn creates 2 dense... so we actually need 4 initializers...