anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
https://anonymous-pits.github.io/pits/
MIT License
274 stars 34 forks source link

Question about pitch encoder #28

Closed ljh0412 closed 7 months ago

ljh0412 commented 9 months ago

Thanks for interesting paper and nice repo.

I got question about pitch encoder. In pitch encoder, it takes inputs as ying, spectrogram lengths and speaker embedding. But its quite wired thing as the encoder get length based mask by common.sequence_mask, so it should be ying lengths i think.

Is it should be replaced with the parameter? Please note me to adjust my codes.

p0p4k commented 9 months ago

spectrogram lengths and ying lengths are same in this case, since they use same hop-lengths, etc. Each spec frame has a corresponding ying frame.

ljh0412 commented 7 months ago

spectrogram lengths and ying lengths are same in this case, since they use same hop-lengths, etc. Each spec frame has a corresponding ying frame.

Thank you for your reply.