Confusion about pads and alignment

Great work! I have some confusion about pads https://github.com/bshall/hubert/blob/main/hubert/model.py#L81

After padding, the shape is the same as the spec whose sample rate and hop size are the same as those of hubert. But shape of hubert in fairseq is less than that of softvc_hubert 1.

e.g. 16k sr+320hop size, in the temporal dimension spec: 250 soft_hubert: 250 fairseq_hubert: 249 When using fairseq_hubert, I usually cut the tail of spec to align hubert. It seems that because of padding, we don't need cut the tail of spec when using soft_hubert. I don't know which way is better for alignment (pad input wav of hubert or cut spec). Can you give us some suggestion?

bshall / hubert

Confusion about pads and alignment #11