chrisdonahue / sheetsage

Transcribe music into lead sheets!
https://chrisdonahue.com/sheetsage
Other
297 stars 65 forks source link

Got `RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR` on enabling `Jukebox` feature #13

Open tanchihpin0517 opened 1 year ago

tanchihpin0517 commented 1 year ago

I run sheetsage with jukebox on a 24GB gpu machine but fail for some reasons.

here is the log:

➜ ./sheetsage.sh -j ../../test.mp3
Copying input file ../../test.mp3 to container as ./output/input
Running Sheet Sage via Docker with args: -j /sheetsage/output/input
INFO:root:Loading audio from /sheetsage/output/input
INFO:root:DETECTING_BEATS
INFO:root:EXTRACTING_FEATURES
INFO:root:Feature extraction w/ Jukebox could take several minutes.
--------------------------------------------------------------------------
[[17558,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: b3174d0e0d40

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
  0%|                                                                                                                                                                                                           | 0/21 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/sheetsage/sheetsage/infer.py", line 851, in <module>
    tqdm=tqdm,
  File "/sheetsage/sheetsage/infer.py", line 681, in sheetsage
    audio_path_or_bytes, input_feats, tertiaries_times, chunks_tertiaries, tqdm
  File "/sheetsage/sheetsage/infer.py", line 367, in _extract_features
    fr, feats = extractor(audio_path, offset=offset, duration=duration)
  File "/sheetsage/sheetsage/representations/jukebox.py", line 233, in __call__
    codified_audio = self.codify_audio(audio)
  File "/sheetsage/sheetsage/representations/jukebox.py", line 132, in codify_audio
    return self._codify_audio(audio, tqdm=tqdm)
  File "/sheetsage/sheetsage/representations/jukebox.py", line 126, in _codify_audio
    context_codified = self.vqvae.encode(context)[-1].view(-1).cpu().numpy()
  File "/usr/local/lib/python3.6/dist-packages/jukebox/vqvae/vqvae.py", line 141, in encode
    zs_i = self._encode(x_i, start_level=start_level, end_level=end_level)
  File "/usr/local/lib/python3.6/dist-packages/jukebox/vqvae/vqvae.py", line 132, in _encode
    x_out = encoder(x_in)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/jukebox/vqvae/encdec.py", line 80, in forward
    x = level_block(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/jukebox/vqvae/encdec.py", line 26, in forward
    return self.model(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 202, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

How could I solve this problem?

elloza commented 1 year ago

Same issue here. Any direction to solve it? Could it be because the container has cuda 10.1 and the host has a higher driver like cuda 11.7?

Thank you in advance!