Closed pabogdan closed 2 months ago
@pabogdan Checking on the dataset from here: https://github.com/NeuroBench/system_benchmarks/tree/audio_scene_classification/utils
I ran the download script and I wasn't able to reproduce the train/val/test numbers, where are the samples for the test set coming from?
See below code block, test set only has 1320 samples:
from dcase_audio_scene_classification import SceneData
dataset = SceneData(raw_data_dir="/Users/jasonyik/Desktop/system_benchmarks/utils/data/raw",
meta_files_dir="/Users/jasonyik/Desktop/system_benchmarks/utils/data",
valid_scenes=["airport", "street_traffic", "bus", "park"],
valid_devices=['a'],
target_sample_rate=8000,
max_samples_per_scene=10,
resize_time=1
)
for key, value in dataset.data.items():
print(key, value.shape)
'''
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_test torch.Size([1320, 8000])
y_test torch.Size([1320, 4])
x_eval torch.Size([1320, 8000])
y_eval torch.Size([1320, 4])
'''
Also tagging @MinaKh
@jasonlyik I'm reviewing what's happening there. I'll update you and Mina as soon as I understand what went wrong
@jasonlyik This was a mistake at our end. The dataloader I uploaded did not contain the logic that also packages the additional samples we agreed upon for the test set. The accuracy and power numbers we've generated are with the extended test dataset (as seen below), however when creating the Neurobench dataloader the extra test set was accidentally omitted.
My output now:
Loading train data:
4136it [00:25, 165.37it/s]
Loading eval data:
1624it [00:07, 209.40it/s]
Loading test data:
132it [00:00, 275.02it/s]
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_test torch.Size([1320, 8000])
y_test torch.Size([1320, 4])
x_eval torch.Size([16240, 8000])
y_eval torch.Size([16240, 4])
@pabogdan Verified, thanks Peter!
Thanks @pabogdan for fixing this. I pulled the latest scripts and now get exactly the same output as you:
Loading train data:
4136it [01:32, 44.68it/s]
Loading eval data:
1624it [00:45, 35.56it/s]
Loading test data:
132it [00:00, 326.45it/s]
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_test torch.Size([1320, 8000])
y_test torch.Size([1320, 4])
x_eval torch.Size([16240, 8000])
y_eval torch.Size([16240, 4])
So we will have 1320 test samples. right? but here still the test and val sample numbers are swapped: https://github.com/NeuroBench/system_benchmarks/blob/ASC_rules/ASC.md Can you please fix it too?
@MinaKh My understanding is that the existing train/test/eval names correspond to the more standard train/val/test naming convention. The dataset is uploaded with the splits named train/test/eval, where:
train (41360) == standard train, the set to train model test (1320) == standard val, to tune the training settings eval (16240) == standard test, official accuracy is reported on this set and it should not be used during train/tune model
I made edits to the naming convention in the dataset file on the audio_scene_classification branch: https://github.com/NeuroBench/system_benchmarks/commit/ae768e1fd020cbdd8774156b6001c847d6e0d37d
So now the naming of the splits is consistent with what is standard:
$ python dcase_audio_scene_classification.py
Loading train data:
4136it [00:06, 682.10it/s]
Loading val data:
132it [00:00, 850.34it/s]
Loading test data:
1624it [00:01, 846.04it/s]
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_val torch.Size([1320, 8000])
y_val torch.Size([1320, 4])
x_test torch.Size([16240, 8000])
y_test torch.Size([16240, 4])
@jasonlyik Thanks for clarification and updating the script. It was confusing indeed.
Covered in #6
Confirming val and test set sizes.
TODO: