Hi, I'm running trainings with and w/o using the pretained checkpoint (VCTK) as initial state. However, in both cases the target pitch is affected by the input pitch (e.g. from female to male conversion, the target pitch will be higher, like somewhere between the source and target speakers range). This was not happening with the pre-trained model itself. Would you mind to share some comments on things that were considered to trained the pre-trrained model that may be missing in the paper or here in this repository?, did you experience this in your experimentation?, thanks in advance.
Hi, I'm running trainings with and w/o using the pretained checkpoint (VCTK) as initial state. However, in both cases the target pitch is affected by the input pitch (e.g. from female to male conversion, the target pitch will be higher, like somewhere between the source and target speakers range). This was not happening with the pre-trained model itself. Would you mind to share some comments on things that were considered to trained the pre-trrained model that may be missing in the paper or here in this repository?, did you experience this in your experimentation?, thanks in advance.