MTG / mtg-jamendo-dataset

Metadata, scripts and baselines for the MTG-Jamendo dataset
Apache License 2.0
264 stars 37 forks source link

"Reproduce experiments" don't work #58

Open FelipeMarra opened 1 week ago

FelipeMarra commented 1 week ago

The error

Stack error in the data loader due to tensors with different shapes.

How to reproduce:

  1. Run the preprocessing step python3 scripts/baseline/get_npy.py run 'your_path_to_spectrogram_npy' on the mood/theme subset, since the baseline.pth outputs 56 classes.
  2. Run the train command python3 scripts/baseline/main.py --mode 'TRAIN'

Trying to solve:

The article says:

we only used a centered 29.1s audio segment

Which I believe would be the equivalent to getting the mel with melspectrograms.py setting full_audio to False. That yields a [96, 1366] tensor that is the shape needed to run inference in the baseline model.

Since the mels in the dataset were calculated over the whole duration of the audios, the data loader might need to center a [96, 1366] segment in the dataset's mels.

When trying to obtain a mel from an audio to see if getting the 29.1s segment would be equivalent to center a [96, 1366] segment in the dataset's mels, I obtained the same dimensions, but different values. For example, for the 00/13400.mp3 audio, the precomputed mel and the mel calculated with melspectrogram.py will have the dimentions [96, 9602]. But if you print both at [:,0] the dataset precomputed one will contain the following numbers:

[-69.5358, -64.7463, -61.8604, -59.8808, -58.1119, -58.2752, -58.9025, -60.2660, -62.0527, -64.3706, -68.4771, -72.2208, -75.7047, -79.4953, -85.4376, -85.6893, -81.9504, -80.0834, -79.7122, -82.1272, -89.4751, -90.0000, -90.0000, -90.0000, -90.0000, -88.8482, -86.1220, -84.0110, -81.6328, -81.6245, -82.9754, -83.6547, -85.0630, -88.5137, -90.0000, -87.7471, -85.0853, -82.7995, -84.5712, -88.1776, -88.0879, -86.8838, -89.5533, -90.0000, -84.0632, -81.3411, -83.6548, -87.9001, -90.0000, -90.0000, -88.2064, -84.8365, -85.5288, -87.3742, -88.8410, -90.0000, -90.0000, -85.1121, -83.0755, -86.6247, -90.0000, -89.6840, -87.7929, -84.6036, -86.9026, -90.0000, -90.0000, -87.8175, -83.3707, -84.7766, -90.0000, -90.0000, -90.0000, -90.0000, -90.0000, -88.1323, -90.0000, -88.8589, -90.0000, -90.0000, -90.0000, -88.7473, -90.0000, -89.0149, -90.0000, -90.0000, -90.0000, -90.0000, -90.0000, -90.0000, -88.6646, -90.0000, -90.0000, -90.0000, -90.0000, -90.0000]

While the calculated with melspectrogram.py will be like:

[-139.0715, -129.4926, -123.7208, -119.7616, -116.2238, -116.5503, -117.8051, -120.5321, -124.1054, -128.7413, -136.9542, -144.4415, -151.4094, -158.9905, -170.8752, -171.3786, -163.9008, -160.1669, -159.4244, -164.2545, -178.9503, -193.2552, -186.2103, -188.0788, -188.0027, -177.6964, -172.2440, -168.0220, -163.2655, -163.2491, -165.9508, -167.3093, -170.1259, -177.0275, -180.8245, -175.4943, -170.1705, -165.5989, -169.1423, -176.3552, -176.1757, -173.7676, -179.1065, -182.1857, -168.1263, -162.6822, -167.3096, -175.8002, -185.6764, -189.3085, -176.4127, -169.6730, -171.0577, -174.7484, -177.6820, -192.4283, -181.8572, -170.2243, -166.1510, -173.2494, -181.5207, -179.3679, -175.5858, -169.2072, -173.8052, -189.5120, -199.9228, -175.6349, -166.7414, -169.5531, -190.6465, -191.5059, -186.6069, -193.5956, -188.5288, -176.2646, -181.7400, -177.7178, -189.9011, -180.9200, -181.7761, -177.4945, -183.4301, -178.0298, -189.3605, -186.7196, -189.7235, -185.6219, -188.4031, -185.2255, -177.3292, -184.3699, -185.4904, -200.0000, -200.0000, -200.0000]

Also, there is a bug in the validation function, because the data loader returns 3 values not 2.