Alibaba-MIIL / AudioClassfication

MIT License
75 stars 13 forks source link

I'm try to train with docker #13

Closed Seoung-wook closed 2 years ago

Seoung-wook commented 2 years ago

root@21c2344a24ac:/benchmarks/benchmarks/AudioClassfication# python trainer.py --max_lr 3e-4 --run_name r1 --emb_dim 128 --dataset urban8k --seq_len 90112 --mix_ratio 1 --epoch_mix 12 --mix_loss bce --batch_size 1 --n_epochs 3500 --ds_factors 4 4 4 4 --amp --save_path outputs /usr/local/lib/python3.8/dist-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail. warnings.warn( Namespace(amp=True, augs_mix=['mixup', 'timemix', 'freqmix', 'phmix'], augs_noise=['awgn', 'abgn', 'apgn', 'argn', 'avgn', 'aun', 'phn', 'sine'], augs_signal=['amp', 'neg', 'tshift', 'tmask', 'ampsegment', 'cycshift'], batch_size=1, data_path='../data/UrbanSound8K', data_subtype='balanced', dataset='urban8k', dim_feedforward=512, ds_factors=[4, 4, 4, 4], ema=0.995, emb_dim=128, epoch_mix=12, ext_pretrained=None, filter_bias_and_bn=True, fold_id=None, gpu_ids=[0], kd_model=None, load_path=None, local_rank=0, log_interval=100, loss_type='label_smooth', max_lr=0.0003, mix_loss='bce', mix_ratio=1.0, model_type='SoundNetRaw', multilabel=False, n_classes=10, n_epochs=3500, n_head=8, n_layers=4, nf=16, num_workers=8, resume_training=False, run_name=PosixPath('r1'), sampling_rate=22050, save_interval=100, save_path=PosixPath('outputs'), scheduler=None, seq_len=90112, use_balanced_sampler=False, use_bg=False, use_ddp=False, use_dp=False, wd=1e-05) -1 ***Dummy Run**** dummy succededd, avg_time_batch:15.636324882507324ms Traceback (most recent call last): File "trainer.py", line 699, in main() File "trainer.py", line 695, in main train(args) File "trainer.py", line 549, in train for iterno, (x, y) in enumerate(train_loader): File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/usr/local/lib/python3.8/dist-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/benchmarks/benchmarks/AudioClassfication/datasets/urban8K_dataset.py", line 65, in getitem audio = AudioAugs(self.transforms, sampling_rate, p=0.5)(audio) File "/benchmarks/benchmarks/AudioClassfication/datasets/audio_augs.py", line 464, in init augs['phn'] = RandomPhNoise(p=p, fs=fs, sgm=0.01) File "/benchmarks/benchmarks/AudioClassfication/datasets/audio_augs.py", line 387, in init super().init(fs=fs) File "/benchmarks/benchmarks/AudioClassfication/utils/helper_funcs.py", line 28, in init raise ValueError ValueError

my torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 and tensorflow==2.7.0

and, i tried with edit DataLoader worker==0 and, in my docker environment, sampling rate is not fitted before add noise. so, fitted fs = 20050, it executed until forwarding but error accured with padding issue. i guess it came from 4d conv.

please answer about my question. ps. can you give me requirements.txt?

avi33 commented 2 years ago

Hi, The snapshot you copied is for not supported - since there are some augmentations in frequency domain, i've only taken care of 16000, 22050, 8000 Hz cases with corresponding stft parameters. Nevertheless, padding occurs if the loaded audio signal is shorter than provided sequence length "--seq_len 90112". can you provide what is the size of the sample that creates this crash? regarding requirements.txt: torch 1.7.1 is fine, i've also tested with latest torch in case you want to upgrade the docker. i can create the txt file but it will take few days, presumably the issue is configuration rather with packages.

Seoung-wook commented 2 years ago

thank you, i will try again with 22050 Hz :)