YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.06k stars 203 forks source link

Can AST be used for audio representation towards solving the frame-level classification tasks? #90

Open SylviaZiyaZhou opened 1 year ago

SylviaZiyaZhou commented 1 year ago

Hi Yuan,

I am currently reading your wonderful papers about the AST and SSAST. I wonder if the AST can be used to extract frame-level representation of audio (like music) to solve the frame-level classification tasks? Thanks.

YuanGongND commented 1 year ago

Hi there,

I wonder if the AST can be used to extract frame-level representation of audio ...

Yes, technically both AST and SSAST can, but some pretraining is needed for good performance. Since AST only support patch-level pretraining, please try SSAST, see this issue for how to do it.

(like music) to solve the frame-level classification tasks? Thanks.

I am not sure about this. From our clip-level classification results (shown in SSAST paper), for general audio, patch-level SSAST is better than frame-level SSAST. But I haven't test specifically for music, it might work as music also has discrete frequency patterns like speech.

-Yuan

SylviaZiyaZhou commented 1 year ago

Hi Yuan, and thanks for your reply and I am trying to finetune the SSAST on custom data and it works. I wonder if there are AST models pretrained on ImageNet? I just want to compare its performance with ViT pretrained on ImageNet on my own tasks.

SylviaZiyaZhou commented 1 month ago

Hi Yuan,

Glad to help. I check another email address @.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email.

Bests, Ziya


From: Waseem Randhawa @.> Sent: Thursday, May 23, 2024 19:46 To: YuanGongND/ast @.> Cc: SylviaZiyaZhou @.>; Mention @.> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)

@SylviaZiyaZhouhttps://github.com/SylviaZiyaZhou can you please contact me on my @.**@.>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance.

— Reply to this email directly, view it on GitHubhttps://github.com/YuanGongND/ast/issues/90#issuecomment-2126906656, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY. You are receiving this because you were mentioned.Message ID: @.***>

SylviaZiyaZhou commented 1 month ago

Sorry for the email mistakenly sent to you. Please just ignore it. Thanks!

Bests, Ziya


From: Ziya Zhou @.> Sent: Wednesday, May 29, 2024 18:50 To: YuanGongND/ast @.> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)

Hi Yuan,

Glad to help. I check another email address @.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email.

Bests, Ziya


From: Waseem Randhawa @.> Sent: Thursday, May 23, 2024 19:46 To: YuanGongND/ast @.> Cc: SylviaZiyaZhou @.>; Mention @.> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)

@SylviaZiyaZhouhttps://github.com/SylviaZiyaZhou can you please contact me on my @.**@.>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance.

— Reply to this email directly, view it on GitHubhttps://github.com/YuanGongND/ast/issues/90#issuecomment-2126906656, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY. You are receiving this because you were mentioned.Message ID: @.***>