why pad_or_trim use 1000 rather than 3000 when transcribe_audio?

YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

390 stars 36 forks source link

Open peggyxpxu opened 4 months ago

peggyxpxu commented 4 months ago

why pad_or_trim use 1000 rather than 3000 when transcribe_audio? mel = pad_or_trim(mel, 1000).to(model.device).to(dtype)

YuanGongND commented 2 months ago

oh that is because most of our data is 10s, so it is just to save some compute.

-Yuan