Closed SornrasakC closed 2 years ago
I think The problem is representation of your symbols with text. You must get good alignment in 10k steps. If don't its mean something get wrong.
Yes. Warmstart helping.
I don't know whats happening, but yes you can do inference with only 1 flow.
I think The problem is representation of your symbols with text. You must get good alignment in 10k steps. If don't its mean something get wrong.
Yes. Warmstart helping.
I don't know whats happening, but yes you can do inference with only 1 flow.
I guess.. I will try changing my symbols to ASCII, will come update soon.
Turn out, 20 out of ~2k audio files was just pure noise, removing them solves the problem.
@Bahm9919 Sorry for the @ and for commenting on the closed issue. I saw that it seems like you just recently able to do the inference, would it be possible for you to kindly share how you did it?
Especially
Submodule path 'tacotron2': checked out '6f435f7f29c3e1553cf2dd7ca2daf56903b20c39'
Or anything else, surprisingly? Very much appreciated.
@Bahm9919 Sorry for the @ and for commenting on the closed issue. I saw that it seems like you just recently able to do the inference, would it be possible for you to kindly share how you did it?
Especially
- Torch version
- Any changes in inference.py
- Which WaveGlow weight
- Does your submodule has this commit?
Submodule path 'tacotron2': checked out '6f435f7f29c3e1553cf2dd7ca2daf56903b20c39'
Or anything else, surprisingly? Very much appreciated.
I see your issue, will answer there.
Hi, I have been trying to train this model with Thai-dataset (1 speaker, ~5 hour).
After ~80k Steps (batch size = 1, ~31 epoch), the attention weights turns out like this
Is it normal to see partial flat lines like this? all the issues I looked through only sees entire flat line or just straight diagonal... Or am I being too impatient? it's just 80k steps after all.
Here's some additional info
(Is this even correct?)
The above result comes from me warm starting the model from
flowtron_ljs.pt
with theflow=1
config file (speaker_embedding.weight
ignored)Things I have done
embedding.weight
since they have different shape during warmstart.Additional Questions
flow=1
until attentions aligned, seconds, same butflow=2
, and then third, turns theattn_prior
off to attends the speaker. What's the sign to look for during third step? how do I know if the model has attended?ctc_loss
starts at 10k iters, do I need to change this? does starting this earlier or later affects anything?flowtron_ljs
really helps in learning different language? I'm wondering which parts did it helps with, the decoder?Thank you for reading and would really appreciate any answers or suggestions.