Very long audio files : sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED

MarvinLvn commented 3 years ago

Hi there !

Thanks for your work ! I've been applying your model on short audio files with success, and the result is very impressive ! I'd like to go one step further and enhance 16-hour long audio files.

When I launch :

python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX} --verbose --device cuda

I get :

Traceback (most recent call last):
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 138, in <module>
    enhance(args, local_out_dir=args.out_dir)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 130, in enhance
    estimate = get_estimate(model, noisy_signals, args)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/enhance.py", line 67, in get_estimate
    estimate = model(noisy)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/denoiser/demucs.py", line 161, in forward
    mono = mix.mean(dim=1, keepdim=True)
RuntimeError: sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Reduce.cuh":928, please report a bug to PyTorch.

I tried to launch the model on cpus, with or without the --streaming flag but without success. According to this thread, it seems that the error occurs when calling the sum function on very large tensors.

Here's the error I get on CPU :

/var/spool/slurmd/job1202815/slurm_script: line 40: 10526 Floating point exception(core dumped) python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX} --num_workers 10 --verbose

Does it seem unrealistic to enhance such long audio files to you ? Can you think of any workaround ? I could cut my long audio files into multiple smaller chunks, but I'd create artifacts and would prefer to avoid this pain :)

Thanks a lot :)

adefossez commented 3 years ago

Can you post the stacktrace with the --streaming flag? I am surprised it does not work naturally in that case.

MarvinLvn commented 3 years ago

Sure !

In this case, I get killed because of a memory issue before having enhanced the first 16-hour long audio file (no output is generated)

python -m denoiser.enhance --dns64 --noisy_dir=/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset --out_dir=/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset_enhanced_by_dns64_cuda --num_workers 10 --verbose --device cuda --streaming
/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
DEBUG:__main__:Namespace(batch_size=1, device='cuda', dns48=False, dns64=True, dry=0, master64=False, model_path=None, noisy_dir='/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset', noisy_json=None, num_workers=10, out_dir='/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset_enhanced_by_dns64_cuda', sample_rate=16000, streaming=True, verbose=10)
INFO:denoiser.pretrained:Loading pre-trained real time H=64 model trained on DNS.
DEBUG:denoiser.pretrained:Demucs(
  (encoder): ModuleList(
    (0): Sequential(
      (0): Conv1d(1, 64, kernel_size=(8,), stride=(4,))
      (1): ReLU()
      (2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
      (3): GLU(dim=1)
    )
    (1): Sequential(
      (0): Conv1d(64, 128, kernel_size=(8,), stride=(4,))
      (1): ReLU()
      (2): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
      (3): GLU(dim=1)
    )
    (2): Sequential(
      (0): Conv1d(128, 256, kernel_size=(8,), stride=(4,))
      (1): ReLU()
      (2): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      (3): GLU(dim=1)
    )
    (3): Sequential(
      (0): Conv1d(256, 512, kernel_size=(8,), stride=(4,))
      (1): ReLU()
      (2): Conv1d(512, 1024, kernel_size=(1,), stride=(1,))
      (3): GLU(dim=1)
    )
    (4): Sequential(
      (0): Conv1d(512, 1024, kernel_size=(8,), stride=(4,))
      (1): ReLU()
      (2): Conv1d(1024, 2048, kernel_size=(1,), stride=(1,))
      (3): GLU(dim=1)
    )
  )
  (decoder): ModuleList(
    (0): Sequential(
      (0): Conv1d(1024, 2048, kernel_size=(1,), stride=(1,))
      (1): GLU(dim=1)
      (2): ConvTranspose1d(1024, 512, kernel_size=(8,), stride=(4,))
      (3): ReLU()
    )
    (1): Sequential(
      (0): Conv1d(512, 1024, kernel_size=(1,), stride=(1,))
      (1): GLU(dim=1)
      (2): ConvTranspose1d(512, 256, kernel_size=(8,), stride=(4,))
      (3): ReLU()
    )
    (2): Sequential(
      (0): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      (1): GLU(dim=1)
      (2): ConvTranspose1d(256, 128, kernel_size=(8,), stride=(4,))
      (3): ReLU()
    )
    (3): Sequential(
      (0): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
      (1): GLU(dim=1)
      (2): ConvTranspose1d(128, 64, kernel_size=(8,), stride=(4,))
      (3): ReLU()
    )
    (4): Sequential(
      (0): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
      (1): GLU(dim=1)
      (2): ConvTranspose1d(64, 1, kernel_size=(8,), stride=(4,))
    )
  )
  (lstm): BLSTM(
    (lstm): LSTM(1024, 1024, num_layers=2)
  )
)
/var/spool/slurmd/job1228906/slurm_script: line 42:  8335 Killed                  python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX}_cuda --num_workers 10 --verbose --device cuda --streaming
slurmstepd: error: Detected 1 oom-kill event(s) in step 1228906.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

MarvinLvn commented 3 years ago

Hi there !

Update on my problem :) I managed to get the enhanced 16-h long audio file with the --streaming file and by requiring more memory. Of course, this makes the whole thing very long. Thing is I have 120 of them to process T_T I think I'll just do it by running the denoiser separately on each file.

If you agree, I think we can close this issue. Thanks a lot for your help on that !

adefossez commented 3 years ago

Hey @MarvinLvn. The amount of memory required by the streaming processor shouldn't be more than one or twice the input audio file size (so total 3 times if you count the input audio itself). 16h of uncompressed audio is quite large, but this is very specific to your use case and we won't add extra support for this.

Glad you managed to find a workaround, closing the issue then :)

facebookresearch / denoiser

Very long audio files : sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED #42