YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.06k stars 203 forks source link

Ask for help #130

Open Ingram-lin opened 1 month ago

Ingram-lin commented 1 month ago

Hello, I have learned from the example of extracting features from speech using the AST model. I mimicked this example to extract features from new speech using my own model, and the shapes I obtained are all [1, 1214, 768]. However, I only want to get features similar to [1, 768]. So, I want to ask, are the features obtained from the final layer of AST all [1, 1214, 768]? Or have I made a mistake in my operation? Thank you for your assistance, and I look forward to your reply.

Ingram-lin commented 1 month ago

64d2da6249509b91613b033fc408437