Open Moadab-AI opened 2 years ago
Hi there,
Can you point me to the code that is related to the "forbidden" classes? Thanks!
-Yuan
Sorry my bad, that was not in your code, but in FSD50K original release: FSD50k/FSD50K.ground_truth/analyze_dataset.py
That's fine. For your other questions.
I couldnt find in any of you recent publications on Audioset how you split the unbalanced (or even balanced) train segments to train and val for hyper parameter tuning.
No, we don't use a validation set for AudioSet experiments, I think that is a common setting for most papers (e.g.,this paper, see the footnote on page 1237; you can see this in the code of other papers to verify this point) using AudioSet, practically, it is non-trivial to sample a meaningful validation set due to the label co-occurrence. For this reason, we did not report the best model but reported the average performance of the last few model checkpoints during training. The model performance is not sensitive to most hyperparameters and we didn't tune most of hyperparameters in the paper. On FSD50K that does have a validation split, the PSLA methods work equally well.
For the missing link, it seems to be a blank link, I cannot remember if I have something on dropbox. I will try to fix that when I have some time.
For the "# only apply to the vocal sound data" comment, that might be a mistake, did you see anything wrong with the prepared JSON file on the eval set? FYI, we collected a VocalSound dataset and will release it soon, we did some experiments on combining the FSD50K and VocalSound dataset, that's why you see the comment there, and I might forget to remove it when I clean up and upload the code. If you don't see an issue with the output JSON file, you can safely ignore the comment.
-Yuan
Thanks for the explanation
Hi,
I couldnt find in any of you recent publications on Audioset how you split the unbalanced (or even balanced) train segments to train and val for hyper parameter tuning. Just to try to replicate your results. also the dropbox link for the PSLA experiments you have listed is down. On another note regarding FSD50k, could you elaborate what are those "forbidden" classes and why? also could you explain the purpose of this comment in prep_fsd50k.py when generating the JSON files please?: "# only apply to the vocal sound data"
Thanks