NTT123 / vietTTS

Vietnamese Text to Speech library
MIT License
200 stars 91 forks source link

could not synchronize on CUDA context #16

Closed lethanhson9901 closed 2 years ago

lethanhson9901 commented 2 years ago

Today, I ran your acoustic model on colab and I got this issues


training: 0% 0/1900001 [00:00<?, ?it/s]2021-12-07 03:51:13.659473: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2085] Execution of replica 0 failed: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED training: 0% 0/1900001 [00:16<?, ?it/s] Traceback (most recent call last): File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 139, in train() File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 101, in train loss, (params, aux, rng, optim_state) = update(params, aux, rng, optim_state, batch) File "/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, kwargs) File "/usr/local/lib/python3.7/dist-packages/jax/_src/api.py", line 419, in cache_miss donated_invars=donated_invars, inline=inline) File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1632, in bind return call_bind(self, fun, *args, *params) File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1623, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1635, in process return trace.process_call(self, fun, tracers, params) File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 627, in process_call return primitive.impl(f, tracers, params) File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 690, in _xla_call_impl out = compiled_fun(*args) File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 1100, in _execute_compiled out_bufs = compiled.execute(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 139, in train() File "/content/drive/MyDrive/vietTTS/vietTTS/nat/acoustic_trainer.py", line 101, in train loss, (params, aux, rng, optim_state) = update(params, aux, rng, optim_state, batch) File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py", line 1100, in _execute_compiled out_bufs = compiled.execute(input_bufs) RuntimeError: INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED 2021-12-07 03:51:14.389335: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: Begin stack trace

_PyModule_ClearDict
PyImport_Cleanup
Py_FinalizeEx

_Py_UnixMain
__libc_start_main
_start

End stack trace

2021-12-07 03:51:14.389456: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:124] Check failed: pair.first->SynchronizeAllActivity()

I guess this issue comes from mismatch version of requirements. Could you please define your specific version of dependencies or update requirements ?

hoangnv172566 commented 2 years ago

I haved faced the same issue. My solution is moving all my workspace into another colab account (I train my model on google colab). But I think it's just a temporary way.