Closed annahung31 closed 5 years ago
I found that error means I have to change the argument 'normalize' from unit_tri
to unit_sum
or unit_max
. I solve this but encounter another error:
RuntimeError: Error while configuring MelBands: TriangularBands: the number of spectrum bins is insufficient for the specified number of triangular bands. Use zero padding to increase the number of FFT bins.
I wonder if I use the scripts/melspectrograms.py
in a wrong way?
I found that I should change the parameter zeroPadding=0
to 512
, which is equals to frameSize
. That solved the problem and get the Mel-Spectrogram with shape= (96, 1366), which is the shape indicated in the paper. Is that how you guys do for the baseline experiment?
Thanks!
To someone who might interested in this problem, in the end I use the original code from keunwoochoi. https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/audio_processor.py
To avoid recomputing all spectrograms, I made a small change to the dataset, so that every file is now cropped to the desired shape of (96, 1366). This is probably not the best way to do it, but it works. The getitem method looks like:
def __getitem__(self, index):
fn = os.path.join(self.root, 'data/raw_30s_specs/', self.dictionary[index]['path'][:-3]+'npy')
audio = np.array(np.load(fn)).astype('float32')
tags = self.dictionary[index]['tags']
# Transforms
self.transform = transforms.Compose([
transforms.ToPILImage(),
transforms.CenterCrop((96, 1366)),
transforms.ToTensor(),
])
if self.transform:
audio = self.transform(audio)
return audio, tags.astype('float32')
There is another change needed in the model, because now the batches have shape (batch, channels, width, height)
, so no need to unsqueeze.
def forward(self, x):
#x = x.unsqueeze(1)
# init bn
x = self.bn_init(x)
# layer 1
x = self.mp_1(nn.ELU()(self.bn_1(self.conv_1(x))))
# layer 2
x = self.mp_2(nn.ELU()(self.bn_2(self.conv_2(x))))
# layer 3
x = self.mp_3(nn.ELU()(self.bn_3(self.conv_3(x))))
# layer 4
x = self.mp_4(nn.ELU()(self.bn_4(self.conv_4(x))))
# layer 5
x = self.mp_5(nn.ELU()(self.bn_5(self.conv_5(x))))
# classifier
x = x.view(x.size(0), -1)
x = self.dropout(x)
logit = nn.Sigmoid()(self.dense(x))
return logit
@annahung31 We have updated or PyPi wheels with the newest version of Essentia. Install or upgrade to the latest Essentia from pip and you should be able to run the spectrogram extraction code without a problem.
Hi, thanks for the effort. I try to use the Mel-spectrum downloaded from gdrive to run the baseline but found that the downloaded files are full song. As a result, I try to run
scripts/melspectrograms.py
to get Mel-Spectrogram of 29.1s segment. However, I kept getting the error below:RuntimeError: Error while configuring MelBands: Parameter normalize = "unit_tri" is not within specified range: {unit_sum,unit_max}
May I ask what did I miss? Thanks for the help.