cwx-worst-one / EAT

[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
MIT License
113 stars 5 forks source link

How to using this for ASR? #10

Closed lucasjinreal closed 2 weeks ago

lucasjinreal commented 1 month ago

How to using this for ASR?

cwx-worst-one commented 1 month ago

EAT is a self-supervised audio model pre-trained on the AudioSet dataset. However, it has not been pre-trained on speech-specific datasets, so it cannot be directly fine-tuned for ASR tasks. Nevertheless, you could explore conducting experiments by pre-training the model on speech datasets and then fine-tuning it for ASR tasks to evaluate its performance in the speech modality.