If you mean the acceptable pitch range of training the model, you can try to lower --f0_var_min and higher --f0_max.
If you mean to synthesize an utterance with higher pitch range, you can try to manipulate the predicted pitch directly (say, scale them by 1.2) during inference. While it is not obvious how to do so since we are working on frequency bins, we can recover it back to scalar and back to bins:
To manipulate the predicted f0 variance, you need to first recover the scalar value f0 with something like:
If you mean the acceptable pitch range of training the model, you can try to lower
--f0_var_min
and higher--f0_max
.If you mean to synthesize an utterance with higher pitch range, you can try to manipulate the predicted pitch directly (say, scale them by 1.2) during inference. While it is not obvious how to do so since we are working on frequency bins, we can recover it back to scalar and back to bins:
To manipulate the predicted f0 variance, you need to first recover the scalar value f0 with something like:
Then scale it with something like, or do other manipulation of f0 you want:
Finally map it back to frequency bins:
The rest is the same (feeding it to the synthesizer). I didn't test this before but hopefully it will work out.