reproduce inference results

Hello! Thank you for the great work.

I am trying to reproduce the inference results. But just run "inf.sh" with your released checkpoints, I have the following results: Inf with pretrained.pth, I have 0.006102 mAP, which totally makes sense to me. Inf with finetuned.pth, I have 0.3388 mAP, which is much lower than your reported mAP of 0.4729.

The only difference I think should be the audioset data only.