Open SylviaZiyaZhou opened 1 year ago
Hi there,
I wonder if the AST can be used to extract frame-level representation of audio ...
Yes, technically both AST and SSAST can, but some pretraining is needed for good performance. Since AST only support patch-level pretraining, please try SSAST, see this issue for how to do it.
(like music) to solve the frame-level classification tasks? Thanks.
I am not sure about this. From our clip-level classification results (shown in SSAST paper), for general audio, patch-level SSAST is better than frame-level SSAST. But I haven't test specifically for music, it might work as music also has discrete frequency patterns like speech.
-Yuan
Hi Yuan, and thanks for your reply and I am trying to finetune the SSAST on custom data and it works. I wonder if there are AST models pretrained on ImageNet? I just want to compare its performance with ViT pretrained on ImageNet on my own tasks.
Hi Yuan,
Glad to help. I check another email address @.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email.
Bests, Ziya
From: Waseem Randhawa @.> Sent: Thursday, May 23, 2024 19:46 To: YuanGongND/ast @.> Cc: SylviaZiyaZhou @.>; Mention @.> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)
@SylviaZiyaZhouhttps://github.com/SylviaZiyaZhou can you please contact me on my @.**@.>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance.
— Reply to this email directly, view it on GitHubhttps://github.com/YuanGongND/ast/issues/90#issuecomment-2126906656, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY. You are receiving this because you were mentioned.Message ID: @.***>
Sorry for the email mistakenly sent to you. Please just ignore it. Thanks!
Bests, Ziya
From: Ziya Zhou @.> Sent: Wednesday, May 29, 2024 18:50 To: YuanGongND/ast @.> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)
Hi Yuan,
Glad to help. I check another email address @.*** more often. If I do not reply immediately here, perhaps you can contact me via the ust email.
Bests, Ziya
From: Waseem Randhawa @.> Sent: Thursday, May 23, 2024 19:46 To: YuanGongND/ast @.> Cc: SylviaZiyaZhou @.>; Mention @.> Subject: Re: [YuanGongND/ast] Can AST be used for audio representation towards solving the frame-level classification tasks? (Issue #90)
@SylviaZiyaZhouhttps://github.com/SylviaZiyaZhou can you please contact me on my @.**@.>) or please share your email I wanted train the AST for music chord recognition. I needed a little bit guidance.
— Reply to this email directly, view it on GitHubhttps://github.com/YuanGongND/ast/issues/90#issuecomment-2126906656, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL4FHIH6FNPC75IIRYSIAHTZDXJHXAVCNFSM6AAAAAATJKA4XOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHEYDMNRVGY. You are receiving this because you were mentioned.Message ID: @.***>
Hi Yuan,
I am currently reading your wonderful papers about the AST and SSAST. I wonder if the AST can be used to extract frame-level representation of audio (like music) to solve the frame-level classification tasks? Thanks.