anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
https://anonymous-pits.github.io/pits/
MIT License
274 stars 34 forks source link

Questions about "3.3. Implementation Details." #10

Closed isletennos closed 1 year ago

isletennos commented 1 year ago

I have two questions about "3.3. Implementation Details."

  1. why did you change octave range and notes ranging from NANSY?

"We generate Yingram with 80 channels from notes ranging between -5 to 74, which corresponds to 30.8 to 508 Hz.

What was the calculation used to calculate the range of 30.8 to 508 Hz?

Thank you,

anonymous-pits commented 1 year ago

Hi @isletennos!

  1. We started from the identical Yingram setup from NANSY, but it has some artifact in our early experiments. After some experiments, after lowering the range, it was stabilized,
  2. 440 Hz = note 69 in our setup, and *2 in Hz = +24 notes. Thus note 74 = note 69 +5 notes = 440 Hz * (2)^(5/24) = 508.36 Hz. Also note -5 was calculated in the same way.
isletennos commented 1 year ago

We started from the identical Yingram setup from NANSY, but it has some artifact in our early experiments. After some experiments, after lowering the range, it was stabilized,

I see, I understand. Thank you.

440 Hz = note 69 in our setup, and 2 in Hz = +24 notes. Thus note 74 = note 69 +5 notes = 440 Hz (2)^(5/24) = 508.36 Hz. Also note -5 was calculated in the same way.

I see, so it represented the calculation result of the denominator of the c(m) function in NANSY. Then, I have another question. In the c(m) function, sr is divided by this calculated value. In this case, since it becomes 30.8-508Hz, dividing sr=22050Hz by this value results in 43.37-715.90.

ex)22050 / 30.8 = 715.90...

I understand that c(m) is a function that calculates MIDI from frequency. While 43 seems reasonable, I think 715 does not correspond to any MIDI value. https://www.inspiredacoustics.com/en/MIDI_note_numbers_and_center_frequencies Am I thinking in the wrong way?

Thank you.

anonymous-pits commented 1 year ago

Sorry for confusion, we use a term note but it is different with MIDI's note convention.

For 69th note, both of setup convert it as 440 Hz, but in our setup, 24 notes indicate octave not 12 notes as normal MIDI.

isletennos commented 1 year ago

I see, it is true that if you change the octave value, it is no longer a MIDI note. So what you are defining here is a PITS-arranged MIDI note. Thank you very much.