Open cn4750 opened 1 year ago
fix would be to change 'initializer' to be a function - which constructs a VarianceScaling with appropriate parameters - so each FFN gets its own instance rather than sharing the same.
creating two instances of the initializer does not get rid of the warning. It seems a seed is required to get rid of the warning even if seed is hardcoded to 0
hmm - maybe there is more than one broken place?
@@ -1347,6 +1357,7 @@ class TFProcess:
def encoder_layer(self, inputs, emb_size: int, d_model: int,
num_heads: int, dff: int, name: str):
initializer = None
+ initializer2 = None
if self.encoder_layers > 0:
# DeepNorm
alpha = tf.cast(tf.math.pow(2. * self.encoder_layers, 0.25),
@@ -1355,10 +1366,14 @@ class TFProcess:
self.model_dtype)
xavier_norm = tf.keras.initializers.VarianceScaling(
scale=beta, mode='fan_avg', distribution='truncated_normal')
+ xavier_norm2 = tf.keras.initializers.VarianceScaling(
+ scale=beta, mode='fan_avg', distribution='truncated_normal')
initializer = xavier_norm
+ initializer2 = xavier_norm2
else:
alpha = 1
initializer = "glorot_normal"
+ initializer2 = "glorot_normal"
# multihead attention
attn_output, attn_wts = self.mha(inputs,
emb_size,
@@ -1377,7 +1392,7 @@ class TFProcess:
ffn_output = self.ffn(out1,
emb_size,
dff,
- initializer,
+ initializer2,
name=name + "/ffn")
ffn_output = tf.keras.layers.Dropout(self.dropout_rate,
name=name +
I'm giving this a test now...
def _warn_reuse(self):
if getattr(self, "_used", False):
if getattr(self, "seed", None) is None:
warnings.warn(
f"The initializer {self.__class__.__name__} is unseeded "
"and being called multiple times, which will return "
"identical values each time (even if the initializer is "
"unseeded). Please update your code to provide a seed to "
"the initializer, or avoid using the same initalizer "
"instance more than once."
)
else:
self._used = True
seems the source code is only checking for seed is None
yes - because they assume if you are setting the seed you know what you are doing...
I also see no improvement with the fix...
oh ffn creates 2 dense... so we actually need 4 initializers...
@Tilps ran into this issue after upgrading TensorFlow:
From Tilps:
I assume it is this code the above warning references.
@masterkni6, can you look at this when you get a chance?