Closed MarvinLvn closed 3 years ago
Can you post the stacktrace with the --streaming
flag? I am surprised it does not work naturally in that case.
Sure !
In this case, I get killed because of a memory issue before having enhanced the first 16-hour long audio file (no output is generated)
python -m denoiser.enhance --dns64 --noisy_dir=/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset --out_dir=/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset_enhanced_by_dns64_cuda --num_workers 10 --verbose --device cuda --streaming
/gpfswork/rech/xdz/uow84uh/.conda/envs/denoiser/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
'"sox" backend is being deprecated. '
DEBUG:__main__:Namespace(batch_size=1, device='cuda', dns48=False, dns64=True, dry=0, master64=False, model_path=None, noisy_dir='/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset', noisy_json=None, num_workers=10, out_dir='/gpfsscratch/rech/xdz/uow84uh/DATA/ACLEW10K_daylongs_subset_enhanced_by_dns64_cuda', sample_rate=16000, streaming=True, verbose=10)
INFO:denoiser.pretrained:Loading pre-trained real time H=64 model trained on DNS.
DEBUG:denoiser.pretrained:Demucs(
(encoder): ModuleList(
(0): Sequential(
(0): Conv1d(1, 64, kernel_size=(8,), stride=(4,))
(1): ReLU()
(2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
(3): GLU(dim=1)
)
(1): Sequential(
(0): Conv1d(64, 128, kernel_size=(8,), stride=(4,))
(1): ReLU()
(2): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
(3): GLU(dim=1)
)
(2): Sequential(
(0): Conv1d(128, 256, kernel_size=(8,), stride=(4,))
(1): ReLU()
(2): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
(3): GLU(dim=1)
)
(3): Sequential(
(0): Conv1d(256, 512, kernel_size=(8,), stride=(4,))
(1): ReLU()
(2): Conv1d(512, 1024, kernel_size=(1,), stride=(1,))
(3): GLU(dim=1)
)
(4): Sequential(
(0): Conv1d(512, 1024, kernel_size=(8,), stride=(4,))
(1): ReLU()
(2): Conv1d(1024, 2048, kernel_size=(1,), stride=(1,))
(3): GLU(dim=1)
)
)
(decoder): ModuleList(
(0): Sequential(
(0): Conv1d(1024, 2048, kernel_size=(1,), stride=(1,))
(1): GLU(dim=1)
(2): ConvTranspose1d(1024, 512, kernel_size=(8,), stride=(4,))
(3): ReLU()
)
(1): Sequential(
(0): Conv1d(512, 1024, kernel_size=(1,), stride=(1,))
(1): GLU(dim=1)
(2): ConvTranspose1d(512, 256, kernel_size=(8,), stride=(4,))
(3): ReLU()
)
(2): Sequential(
(0): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
(1): GLU(dim=1)
(2): ConvTranspose1d(256, 128, kernel_size=(8,), stride=(4,))
(3): ReLU()
)
(3): Sequential(
(0): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
(1): GLU(dim=1)
(2): ConvTranspose1d(128, 64, kernel_size=(8,), stride=(4,))
(3): ReLU()
)
(4): Sequential(
(0): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
(1): GLU(dim=1)
(2): ConvTranspose1d(64, 1, kernel_size=(8,), stride=(4,))
)
)
(lstm): BLSTM(
(lstm): LSTM(1024, 1024, num_layers=2)
)
)
/var/spool/slurmd/job1228906/slurm_script: line 42: 8335 Killed python -m denoiser.enhance $PRETRAINED_MODEL --noisy_dir=${DATA_DIR} --out_dir=${DATA_DIR}_enhanced_by_${SUFFIX}_cuda --num_workers 10 --verbose --device cuda --streaming
slurmstepd: error: Detected 1 oom-kill event(s) in step 1228906.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
Hi there !
Update on my problem :) I managed to get the enhanced 16-h long audio file with the --streaming file and by requiring more memory. Of course, this makes the whole thing very long. Thing is I have 120 of them to process T_T I think I'll just do it by running the denoiser separately on each file.
If you agree, I think we can close this issue. Thanks a lot for your help on that !
Hey @MarvinLvn. The amount of memory required by the streaming processor shouldn't be more than one or twice the input audio file size (so total 3 times if you count the input audio itself). 16h of uncompressed audio is quite large, but this is very specific to your use case and we won't add extra support for this.
Glad you managed to find a workaround, closing the issue then :)
Hi there !
Thanks for your work ! I've been applying your model on short audio files with success, and the result is very impressive ! I'd like to go one step further and enhance 16-hour long audio files.
When I launch :
I get :
I tried to launch the model on cpus, with or without the --streaming flag but without success. According to this thread, it seems that the error occurs when calling the sum function on very large tensors.
Here's the error I get on CPU :
Does it seem unrealistic to enhance such long audio files to you ? Can you think of any workaround ? I could cut my long audio files into multiple smaller chunks, but I'd create artifacts and would prefer to avoid this pain :)
Thanks a lot :)