NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

DS2 traing in train_eval mode and always failing #426

Closed flassTer closed 5 years ago

flassTer commented 5 years ago

For the wave2letter+ I get:

Building graph on GPU:0 Building graph on GPU:1 Building graph on GPU:2 Building graph on GPU:3 2019-05-08 16:26:44.483961: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-05-08 16:26:44.822827: W tensorflow/compiler/xla/service/platform_util.cc:240] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 11996954624 2019-05-08 16:26:44.823221: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56138d973220 executing computations on platform CUDA. Devices: 2019-05-08 16:26:44.823270: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Invalid argument: device CUDA:0 not supported by XLA service Aborted (core dumped)

flassTer commented 5 years ago

Additionally when I try to compile ds2 in train_eval mode, I get "Encountered unknown variable shape, can't compute total number of parameters"

flassTer commented 5 years ago

The problem was solved by rebooting. Too much data was cached.