YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
362 stars 58 forks source link

Loss curves #5

Open treasan opened 2 years ago

treasan commented 2 years ago

Hey, great work! I just wanted to ask whether you might have the loss curves of your runs so that I can compare with my experiments a little bit?

YuanGongND commented 2 years ago

Hi Tom,

Thanks for your interest.

I didn't plot the loss curve, but I think I still have (pre)-training logs on my server. Which experiment are you looking for?

-Yuan

treasan commented 2 years ago

Thanks for the reply! I am looking for the pre-training run(s), i.e. the one trained on Librispeech + AudioSet.

treasan commented 2 years ago

Bump! Sorry to get back to the question. But I am currently trying to implement SSAST (well, its masked autoencoder counterpart, see https://arxiv.org/pdf/2203.16691.pdf) for music. For efficiency reasons, I tried to have the computed spectograms in FP16 format, but the reconstruction (generative) loss curve seems a bit weird. It first goes down really quickly, rises again to a given point and drops slowly again afterwards.

I just wanted to have some comparison in order to know what I should expect. Thanks in advance!

YuanGongND commented 2 years ago

Hi there,

I think this is our log (gen&dis objective, 400 masking patches, full AS + Librispeech). Unfortunately, I don't think we logged the generation loss but just the discrimination loss. The columns are defined at https://github.com/YuanGongND/ssast/blob/35ae7abbdd2870c008feed4ece8b7c6457421b17/src/traintest_mask.py#L147

For your question

reconstruction (generative) loss curve seems a bit weird. It first goes down really quickly, rises again to a given point and drops slowly again afterwards.

Could it be possible due to you add L_g and L_d together? Otherwise, the L_g on the training set should always drop.

-Yuan