Open noetits opened 6 years ago
If you try, in synthesis, to save and show Attention computed with the model pretrained on LJ-speech for example, it will look like this:
Why is it horizontal and not diagonal like during the training ? The synthesis works just fine though ...
If I comment, in "networks.py", in the function "Attention" the part corresponding to "monotonic attention" like this:
A = tf.matmul(Q, K, transpose_b=True) * tf.rsqrt(tf.to_float(hp.d)) # if mononotic_attention: # for inference # key_masks = tf.sequence_mask(prev_max_attentions, hp.max_N) # reverse_masks = tf.sequence_mask(hp.max_N - hp.attention_win_size - prev_max_attentions, hp.max_N)[:, ::-1] # masks = tf.logical_or(key_masks, reverse_masks) # masks = tf.tile(tf.expand_dims(masks, 1), [1, hp.max_T, 1]) # paddings = tf.ones_like(A) * (-2 ** 32 + 1) # (B, T/r, N) # A = tf.where(tf.equal(masks, False), A, paddings) A = tf.nn.softmax(A) # (B, T/r, N) max_attentions = tf.argmax(A, -1) # (B, T/r) R = tf.matmul(A, V) R = tf.concat((R, Q), -1)
The attention plot will be of diagonal shape, and the synthesis not too bad but will have the problem mentioned in the paper: may skip letters or pronounce several times parts of words.
@noetits Hi, how did you solve the problem?What kind of verions of python, GPU did you use? I have the same problem.
If you try, in synthesis, to save and show Attention computed with the model pretrained on LJ-speech for example, it will look like this:
Why is it horizontal and not diagonal like during the training ? The synthesis works just fine though ...
If I comment, in "networks.py", in the function "Attention" the part corresponding to "monotonic attention" like this:
The attention plot will be of diagonal shape, and the synthesis not too bad but will have the problem mentioned in the paper: may skip letters or pronounce several times parts of words.