Open YSLCoat opened 1 year ago
Can you take a look at Figure 1 of this paper https://arxiv.org/pdf/2305.10790.pdf to see an example to mean pool over the frequency dimension to get representation in temporal order? Code implementation is here:
However, the code is for "no-overalp" patch split, apply to "overlapped" patch split (in this repo) requires some change.
You can also check SSAST which supports naive temporal order representation. https://github.com/YuanGongND/ssast
When you have temporal order representation, you can do seq2seq tasks, e.g., add a CTC on top of the temporal representations.
-Yuan
I got the results I wanted by removing line 184 in https://github.com/YuanGongND/ast/blob/master/src/models/ast_models.py and setting t_stride = 1, I think that should give me a working seq2seq classification. I will take a look at the links you provided as well!
Hi!
Is it trivial to adapt the AST architecture to do sequence to sequence classification? My input data has a label for each audio sample and my goal is to classify each sample in the data.