auspicious3000 / contentvec

speech self-supervised representations
MIT License
460 stars 36 forks source link

issue about forward_padding_mask #12

Closed zhenye234 closed 1 year ago

zhenye234 commented 1 year ago

Hello, thanks for your great work.

I am confused about minus 400 here https://github.com/auspicious3000/contentvec/blob/d746688a32940f4bee410ed7c87ec9cf8ff04f74/contentvec/models/hubert/contentvec.py#L500 May I know why?

auspicious3000 commented 1 year ago

400 is the frame length, and 320 is the frame shift this code is for batch generation only, which is rarely used

zhenye234 commented 1 year ago

Thank you for your reply.