Do I need to crop long audio for inference based on pretrained models？

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.57k stars 6.41k forks source link

Do I need to crop long audio for inference based on pretrained models？ #5560

Open xuduo18311199384 opened 3 weeks ago

xuduo18311199384 commented 3 weeks ago

I have a 5-minute audio file, and the wav2vec features obtained by direct inference and the wav2vec features obtained by cropping into a 10s segment are inconsistent. Is it possible that the accuracy of the results obtained by direct inference of long audio is low? So, how long audio should I crop to get the best result?

xuduo18311199384 commented 3 weeks ago

@alexeib