Open SteveTanggithub opened 1 year ago
Hey there, just train a classifier on the standard 10s scale on Audioset. You might notice, but I generally avoid publishing that code that uses 10s training on Audioset, mainly because of the issue when downloading the dataset, which would lead to a lot of questions.
There are just some caveats that are not really discussed in the paper, but are vital to the success of a good teacher:
traditional
positional embedding of a transformer: In a spectrogram, the position of your frequency is always fixed, you only need to encode that position a single time across all time-frames and not give each position a unique
representation.
How can i get the pretrained_teacher model by myself instead of using the ones u provided?