Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.
Hi! Thanks for releasing the dataset!
May I ask if VAS is from the training set or testing set of AudioSet? Or do you have the record of mapping between these two datasets? Thanks!
Hi! Thanks for releasing the dataset! May I ask if VAS is from the training set or testing set of AudioSet? Or do you have the record of mapping between these two datasets? Thanks!