Open macso-vincent-russell opened 7 months ago
same here, could you find the cause?
In my case is something related to the RIRs because when p_reverb=0.0 it trains normally but when p_reverb=1.0 it gets stuck and killed with the error message above. Trace seems normal:
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1275 | Sampled RIR .._.._guso_in24_rirs_train_recsourcedirectivityHA_right_recsourcedirectivityHA_right_07966.wav with shape [1, 20305]
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataloader:279 | Worker: Getting sample 270566 with seed 270566
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1219 | get_sample() idx 270566 with seed 270566, snr 5, gain -6
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_Zy0goYEHPHU.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1275 | Sampled RIR .._.._guso_in24_rirs_train_recsourcedirectivityHA_right_recsourcedirectivityHA_right_31149.wav with shape [1, 24812]
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_clean_fullband_read_speech_book_02509_chp_0002_reader_03315_40_seg_1.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_ay2X87w6Dxw.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1275 | Sampled RIR .._.._guso_in24_rirs_train_recsourcedirectivityHA_right_recsourcedirectivityHA_right_11673.wav with shape [1, 98547]
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1275 | Sampled RIR .._.._guso_in24_rirs_train_recsourcedirectivityHA_right_recsourcedirectivityHA_right_55381.wav with shape [1, 6963]
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_GAc5dEFDkac.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_lt7jAlr_Er0.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_SbJmk_6PVWg.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataloader:279 | Worker: Getting sample 272373 with seed 272373
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1219 | get_sample() idx 272373 with seed 272373, snr 5, gain 0
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataloader:279 | Worker: Getting sample 365896 with seed 365896
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1219 | get_sample() idx 365896 with seed 365896, snr 5, gain 6
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_VX2czCvwQG0.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_lZW6oaScJPc.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:augmentations:555 | Augmentation RandClipping (c: 0.3069719)
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_squeak_squeakyChair_Freesound_validated_379901_0.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataloader:279 | Worker: Getting sample 28207 with seed 28207
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1219 | get_sample() idx 28207 with seed 28207, snr 40, gain 6
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_clean_fullband_german_speech_CC_BY_SA_4.0_249hrs_339spk_German_Wikipedia_16k_German_Wikipedia_Schlosspark_Nymphenburg_audio_48kHz_seg_7.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_door_Freesound_validated_406193_0.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1275 | Sampled RIR .._.._guso_in24_rirs_train_recsourcedirectivityHA_right_recsourcedirectivityHA_right_24170.wav with shape [1, 49323]
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1275 | Sampled RIR .._.._guso_in24_rirs_train_recsourcedirectivityHA_right_recsourcedirectivityHA_right_31962.wav with shape [1, 35320]
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_F0IYjZN8ojA.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_TG7zqe3C7yw.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_noise_fullband_RG2sjK0Zsng.wav with codec PCM
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataloader:279 | Worker: Getting sample 17304 with seed 17304
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1219 | get_sample() idx 17304 with seed 17304, snr 0, gain 0
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataloader:279 | Worker: Getting sample 112016 with seed 112016
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1219 | get_sample() idx 112016 with seed 112016, snr 20, gain 0
2024-02-12 13:33:55 | TRACE | libdfdata.torch_dataloader:df:dataset:1071 | Loaded sample .._.._DNS-Challenge_datasets_fullband_clean_fullband_read_speech_book_04432_chp_0002_reader_10614_109_seg_2.wav with codec PCM
I cannot find anything weird, I assume that the problem comes from the next RIR the dataloader is trying to load.
I also have tried to check all my RIRs one by one in python, loading with soundfile
and with the following tests in numpy:
Any ideas on what could be causing this?
Also, as pointed out by the OP, the bug might appear in epoch>0, so it has to be related with particular combinations of speech and RIRs.
@macso-vincent-russell could you find the cause?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Small update: the issue still persists when trying to train using the meta SoundSpaces RIR dataset. Using the default DFN3 recipe but changing p_reverb to 1.0:
2024-07-03 11:52:51 | INFO | DF | Start train epoch 0 with batch size 16
2024-07-03 11:53:39 | INFO | DF | [0] [ 0/27346] | loss: 10.75046 | t_sample: 4.41516 | t_ba│
tch: 4.42887 | lr: 1.000E-04 | wd: 1.000E-12
thread 'DataLoader Worker 11' panicked at 'assertion failed: k <= self.len()', /rustc/5680fa18feaa87f│
3ff04063800aec256c3d4b4be/library/core/src/slice/mod.rs:3420:9
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
Aborted (core dumped)
$ python df/train.py data-hdf5/dataset.cfg data-hdf5/ base_dir/ ... 2023-12-06 02:40:53 | INFO | DF | Start train epoch 2 with batch size 1 thread 'DataLoader Worker 1' panicked at 'assertion failed: k <= self.len()', /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/slice/mod.rs:3420:9 note: run with
RUST_BACKTRACE=1
environment variable to display a backtrace Aborted (core dumped)