Update ASC.md - Githubissues

pabogdan commented 3 months ago

Confirming val and test set sizes.

TODO:

confirm number of training examples

jasonlyik commented 3 months ago

@pabogdan Checking on the dataset from here: https://github.com/NeuroBench/system_benchmarks/tree/audio_scene_classification/utils

I ran the download script and I wasn't able to reproduce the train/val/test numbers, where are the samples for the test set coming from?

See below code block, test set only has 1320 samples:

from dcase_audio_scene_classification import SceneData

dataset = SceneData(raw_data_dir="/Users/jasonyik/Desktop/system_benchmarks/utils/data/raw",
                    meta_files_dir="/Users/jasonyik/Desktop/system_benchmarks/utils/data",
                    valid_scenes=["airport", "street_traffic", "bus", "park"],
                    valid_devices=['a'],
                    target_sample_rate=8000,
                    max_samples_per_scene=10,
                    resize_time=1
)

for key, value in dataset.data.items():
    print(key, value.shape)

'''
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_test torch.Size([1320, 8000])
y_test torch.Size([1320, 4])
x_eval torch.Size([1320, 8000])
y_eval torch.Size([1320, 4])
'''

Also tagging @MinaKh

pabogdan commented 3 months ago

@jasonlyik I'm reviewing what's happening there. I'll update you and Mina as soon as I understand what went wrong

pabogdan commented 3 months ago

@jasonlyik This was a mistake at our end. The dataloader I uploaded did not contain the logic that also packages the additional samples we agreed upon for the test set. The accuracy and power numbers we've generated are with the extended test dataset (as seen below), however when creating the Neurobench dataloader the extra test set was accidentally omitted.

My output now:

Loading train data:
4136it [00:25, 165.37it/s]
Loading eval data:
1624it [00:07, 209.40it/s]
Loading test data:
132it [00:00, 275.02it/s]
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_test torch.Size([1320, 8000])
y_test torch.Size([1320, 4])
x_eval torch.Size([16240, 8000])
y_eval torch.Size([16240, 4])

jasonlyik commented 3 months ago

@pabogdan Verified, thanks Peter!

MinaKh commented 3 months ago

Thanks @pabogdan for fixing this. I pulled the latest scripts and now get exactly the same output as you:

Loading train data:
4136it [01:32, 44.68it/s] 
Loading eval data:
1624it [00:45, 35.56it/s]
Loading test data:
132it [00:00, 326.45it/s]
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_test torch.Size([1320, 8000])
y_test torch.Size([1320, 4])
x_eval torch.Size([16240, 8000])
y_eval torch.Size([16240, 4])

So we will have 1320 test samples. right? but here still the test and val sample numbers are swapped: https://github.com/NeuroBench/system_benchmarks/blob/ASC_rules/ASC.md Can you please fix it too?

jasonlyik commented 3 months ago

@MinaKh My understanding is that the existing train/test/eval names correspond to the more standard train/val/test naming convention. The dataset is uploaded with the splits named train/test/eval, where:

train (41360) == standard train, the set to train model test (1320) == standard val, to tune the training settings eval (16240) == standard test, official accuracy is reported on this set and it should not be used during train/tune model

I made edits to the naming convention in the dataset file on the audio_scene_classification branch: https://github.com/NeuroBench/system_benchmarks/commit/ae768e1fd020cbdd8774156b6001c847d6e0d37d

So now the naming of the splits is consistent with what is standard:

$ python dcase_audio_scene_classification.py
Loading train data:
4136it [00:06, 682.10it/s]
Loading val data:
132it [00:00, 850.34it/s]
Loading test data:
1624it [00:01, 846.04it/s]
x_train torch.Size([41360, 8000])
y_train torch.Size([41360, 4])
x_val torch.Size([1320, 8000])
y_val torch.Size([1320, 4])
x_test torch.Size([16240, 8000])
y_test torch.Size([16240, 4])

MinaKh commented 3 months ago

@jasonlyik Thanks for clarification and updating the script. It was confusing indeed.

jasonlyik commented 2 months ago

Covered in #6

NeuroBench / system_benchmarks

Update ASC.md #5