fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

The size of tensor a (17248) must match the size of tensor b (64000) at non-singleton dimension 1 #25

Closed herbiel closed 6 months ago

herbiel commented 8 months ago

i run this python ex_dcase20.py --cuda --pretrained --model_name=dymn04_as --cache_path=cache for custom dataset like urban dataset csv,but sitll fail, File "/root/miniconda3/envs/EfficientAT/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/root/miniconda3/envs/EfficientAT/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/envs/EfficientAT/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/EfficientAT/datasets/dcase20.py", line 129, in getitem x = (x1 l + x2 (1. - l)) RuntimeError: The size of tensor a (17248) must match the size of tensor b (64000) at non-singleton dimension 1 can you help me

fschmid56 commented 8 months ago

Hi,

this problem occurs when you try to use Mixup on the waveforms of two audio clips of different lengths. If you run the command with the argument '--no_wavmix' you can turn off mixing.

herbiel commented 7 months ago

43, in return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility. File "/root/miniconda3/envs/EfficientAT/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/root/miniconda3/envs/EfficientAT/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensorfn out = elem.new(storage).resize(len(batch), *list(elem.size())) RuntimeError: Trying to resize storage that is not resizable i hava --no_wavmix,it need the same lengths ?

fschmid56 commented 7 months ago

I guess at some point you will need to make all waveforms to be of the same length. Since the model is pre-trained on 10-second audio clips, I suggest you first try to zero pad shorter audio clips to 10 seconds and randomly select a 10-second snippet in case you have longer audios.

You can checkout how it is done for FSD50K:

https://github.com/fschmid56/EfficientAT/blob/main/datasets/fsd50k.py#L50

In particular, check out the function 'pad_or_truncate' in line 50.

herbiel commented 7 months ago

can i train by my customize data?

fschmid56 commented 7 months ago

Yes, of course, for this you need to create two files:

I would suggest using the two files ex_fsd50k.py and datasets/fsd50k.py and adapting them to your needs.

herbiel commented 7 months ago

can support 8k audio ?

fschmid56 commented 7 months ago

All models are pre-trained on 32 kHz audio. If you want to fine-tune on a dataset with a different sampling rate, I think that resampling is the best option.

herbiel commented 7 months ago

Epoch 80/80: mAP: 0.5194, val_loss: 0.6143: 100% 1/1 [00:01<00:00, 1.46s/it] Validating: 100% 1/1 [00:00<00:00, 1.30it/s] wandb: Waiting for W&B process to finish... (success). wandb: wandb: Run history: wandb: ROC ▃▂▂▂▁▁▁▁▁▁▁▁▁▂▃▃▄▅▅▅▆▆▆▆▇▇██████████████ wandb: learning_rate ▁▂▃▆████▇▇▇▇▆▆▆▆▆▅▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁▁▁▁ wandb: mAP ▃▂▂▂▁▁▁▁▁▁▁▁▁▂▂▃▃▄▅▅▆▆▆▆▇▇▇█████████████ wandb: train_loss █████▇▇▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▃▃▂▂▂▂▂▁▂▃▁▁▂ wandb: val_loss ████▇▇▆▆▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: wandb: Run summary: wandb: ROC 0.54167 wandb: learning_rate 0.0 wandb: mAP 0.51939 wandb: train_loss 0.53485 wandb: val_loss 0.61425 wandb: wandb: You can sync this run to the cloud by running: wandb: wandb sync /content/EfficientAT/wandb/offline-run-20240125_080342-2e8g9g04 wandb: Find

and how to test my train model ?

fschmid56 commented 7 months ago

You can also use the file ex_fsd50k.py as an example.

If you don't use the --train argument, it will go into the 'evaluate(args)' function (line 181) and evaluate the model on the evaluation set. You just need to take care of loading your own trained model.

herbiel commented 7 months ago

and how to load my trained model ?

herbiel commented 7 months ago

i want to know to how to save the model after i have trained,because i need to use the model like mn04_as

fschmid56 commented 7 months ago

While training, models are saved to the wandb directory.

Loading a pre-trained model is only implemented for the models I have in the GitHub release, but loading your own pre-trained models should be easy, just modify the function '_mobilenet_v3' in the model.py file.