After padding, the shape is the same as the spec whose sample rate and hop size are the same as those of hubert.
But shape of hubert in fairseq is less than that of softvc_hubert 1.
e.g. 16k sr+320hop size, in the temporal dimension
spec: 250
soft_hubert: 250
fairseq_hubert: 249
When using fairseq_hubert, I usually cut the tail of spec to align hubert. It seems that because of padding, we don't need cut the tail of spec when using soft_hubert.
I don't know which way is better for alignment (pad input wav of hubert or cut spec).
Can you give us some suggestion?
Great work! I have some confusion about pads https://github.com/bshall/hubert/blob/main/hubert/model.py#L81
After padding, the shape is the same as the spec whose sample rate and hop size are the same as those of hubert. But shape of hubert in fairseq is less than that of softvc_hubert 1.
e.g. 16k sr+320hop size, in the temporal dimension spec: 250 soft_hubert: 250 fairseq_hubert: 249 When using fairseq_hubert, I usually cut the tail of spec to align hubert. It seems that because of padding, we don't need cut the tail of spec when using soft_hubert. I don't know which way is better for alignment (pad input wav of hubert or cut spec). Can you give us some suggestion?