Closed suwhoanlim closed 3 years ago
Hello, @suwhoanlim .
I think that the reason is to provide accurate validation results for researchers. If input of model is different between train and validation steps, then we cannot know the exact validation result. Also, due to property of Multi-task leaner, the FastSpeech2 model produces poor quality of mel-spectrogram if we use predicted prosodies (pitch/energy). Then, we cannot exactly know which part of module produces poor results. For example, assume that pitch/energy predictor produces poor quality of estimated prosodies and we used them, then we don't exactly know whether pitch predictor produces poor qualities or decoder produces wrong results.
But, in a different point of view, you can edit this repository by your own for different goals. This may be arguable and open question, so I post "my opinion".
Due to no activity, close this issue.
Hello, @Jackson-Kang,
While I was reviewing the codes, I realized that in evaluation.py, when it evaluates the model and calculates the loss, it uses the target data, not the predicted model.
The pieces of code that I found suspicious are as follows:
72-79 in evaluation.py
38-49 in modules.py
Because the model of pitch and energy is defined when passing arguments to model() in evaluation.py, those targets will be used in modules.py, whereas I believe it should use the predicted model.
Is there any reason why we are using the target for validation? Or perhaps there something I missed?
Any comments would be appreciated, Thanks!