Open zhaotao1987 opened 3 years ago
Hi Tao,
This does look like a software+hardware problem, not the package problem.
I would suggest that you can try some Keras testing code to make sure that your GPU and Python version/Keras versions are working properly for some simple NN implementation. Here is one example that you can try.
https://keras.io/examples/vision/mnist_convnet/
Song
On Fri, Nov 12, 2021 at 4:41 AM Tao @.***> wrote:
Hi, It's my first time using a NN framework, I've got the following information when trying to get it run. Thanks a lot in advance for your help. 2021-11-12 17:33:49.258190: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 start time is 1636709630.8717337 Step1: transfer fasta data to CNN input data Step2: classify TEs Step2: 2) domain information is not exist 2021-11-12 17:33:50.937189: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1 2021-11-12 17:33:52.459451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:6d:00.0 name: Tesla T4 computeCapability: 7.5 coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.72GiB deviceMemoryBandwidth: 298.08GiB/s 2021-11-12 17:33:52.459602: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-11-12 17:33:52.464361: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2021-11-12 17:33:52.464474: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2021-11-12 17:33:52.465899: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10 2021-11-12 17:33:52.466284: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10 2021-11-12 17:33:52.470599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11 2021-11-12 17:33:52.471428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11 2021-11-12 17:33:52.471598: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2021-11-12 17:33:52.473906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-11-12 17:33:52.475185: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-11-12 17:33:52.489192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:6d:00.0 name: Tesla T4 computeCapability: 7.5 coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.72GiB deviceMemoryBandwidth: 298.08GiB/s 2021-11-12 17:33:52.491713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-11-12 17:33:52.491826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "/ngsproject/zhaot/tools/DeepTE/DeepTE.py", line 274, in
main() File "/ngsproject/zhaot/tools/DeepTE/DeepTE.py", line 249, in main pipeline_no_m.classify_pipeline(model_dir, input_CNN_data_file, temp_store_opt_dir, sp_type,te_fam,prop_thr) File "/ngsproject/zhaot/tools/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 339, in classify_pipeline predict_te(model_file_dic[model_name], model_name, x_all_test_list, y_all_test_nm_list,input_spe_type,y_all_test_nm_list,prop_thr) File "/ngsproject/zhaot/tools/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 210, in predict_te model = load_model(model) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/saving/save.py", line 202, in load_model compile) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/saving/hdf5_format.py", line 181, in load_model_from_hdf5 custom_objects=custom_objects) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/saving/model_config.py", line 59, in model_from_config return deserialize(config, custom_objects=custom_objects) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/layers/serialization.py", line 163, in deserialize printable_module_name='layer') File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/utils/generic_utils.py", line 672, in deserialize_keras_object list(custom_objects.items()))) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/sequential.py", line 490, in from_config model = cls(name=name) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/sequential.py", line 110, in init name=name, autocast=False) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, *kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 293, in init self._init_batch_counters() File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, args, kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 301, in _init_batch_counters self._train_counter = tf.Variable(0, dtype='int64', aggregation=agg) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 262, in call return cls._variable_v2_call(*args, kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 256, in _variable_v2_call shape=shape) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 237, in previous_getter = lambda kws: default_variable_creator_v2(None, kws) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 2675, in default_variable_creator_v2 shape=shape) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 264, in call return super(VariableMetaclass, cls).call(*args, *kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1595, in init distribute_strategy=distribute_strategy) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1729, in _init_from_args dtype=dtype) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped return func(args, kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1566, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function return constant_op.constant(value, dtype, name=name) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 265, in constant allow_broadcast=True) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 276, in _constant_impl return _constant_eager_impl(ctx, value, dtype, shape, verify_shape) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 301, in _constant_eager_impl t = convert_to_eager_tensor(value, ctx, dtype) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 97, in convert_to_eager_tensor ctx.ensure_initialized() File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized context_handle = pywrap_tfe.TFE_NewContext(opts) tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: initialization error It seems like something wrong with my GPU. Here's what I have. `(base) @.*** scripts]$ nvidia-smi Fri Nov 12 17:39:48 2021+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================| | 0 Tesla T4 Off | 00000000:6D:00.0 Off | Off | | N/A 48C P0 28W / 70W | 0MiB / 16097MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage |
|=============================================================================| | No running processes found |
+-----------------------------------------------------------------------------+ `
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LiLabAtVT/DeepTE/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEEENUTHDNO6J7DA7RRESTULTOMPANCNFSM5H4NJ2NA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Associate Professor in Plant Genomics and Bioinformatics School of Plant and Environmental Sciences Virginia Polytechnic Institute and State University
Thanks very much Song for your reply. It helped. Although still something wrong with my GPU version tensorflow, but the code can also be executed using CPU.. It's actually pretty fast.. I've got the results. However I will try to the fix the GPU version. Thanks.
Hi @zhaotao1987 @songliVT ,
Do you mean DeepTE can also run through normally if there is no GPU hardware?
Best, Kun
We have not tried this before without a GPU.
Song
On Mon, Dec 6, 2021 at 4:00 AM xiekunwhy @.***> wrote:
Hi @zhaotao1987 https://github.com/zhaotao1987 @songliVT https://github.com/songliVT ,
Do you mean DeepTE can also run through normally if there is no GPU hardware?
Best, Kun
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LiLabAtVT/DeepTE/issues/11#issuecomment-986570397, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEEENXCVSUSBHOWGF45ZG3UPR3UVANCNFSM5H4NJ2NA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Associate Professor in Plant Genomics and Bioinformatics School of Plant and Environmental Sciences Virginia Polytechnic Institute and State University
Hi @zhaotao1987 @songliVT ,
Do you mean DeepTE can also run through normally if there is no GPU hardware?
Best, Kun
Yes, indeed. It worked for me and it's fast as well actually...
Hi @zhaotao1987 and @songliVT
I'm trying to run DeepTE by CPU, so I'm using tensorflow instead of tensorflow-gpu. However I'm getting this error:
Traceback (most recent call last):
File "../DeepTE.py", line 274, in <module>
main()
File "../DeepTE.py", line 249, in main
pipeline_no_m.classify_pipeline(model_dir, input_CNN_data_file, temp_store_opt_dir, sp_type,te_fam,prop_thr)
File "/home/oliveirads/softwares/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 441, in classify_pipeline
predict_te(model_file_dic[model_name], model_name, x_LTR_ipt_test_list, y_LTR_ipt_test_nm_list,input_spe_type,y_LTR_ipt_test_nm_list,prop_thr)
File "/home/oliveirads/softwares/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 258, in predict_te
store_results_dic[str(i)] = str(y_test_nm_list[i]) + '\t' + name_number_dic[model_nm][str(new_predicted_classes_list[i])]
KeyError: '3'
Could you please share with us how did you use DeepTE with CPU? Currently I'm running it with 32 cores and 160Gb RAM.
This does not look like a CPU or GPU issue. Can you modify the code to print the "keys" and check which key is missing? From the error message, it is not clear whether "i" is the missing key or "model_nm" is the missing key.
Alternatively, if you can provide some input data, Haidong might have time to test the data for you.
Song
On Wed, Feb 2, 2022 at 1:34 PM oliveirads-bioinfo @.***> wrote:
Hi @zhaotao1987 https://github.com/zhaotao1987 and @songliVT https://github.com/songliVT
I'm trying to run DeepTE by CPU, so I'm using tensorflow instead of tensorflow-gpu. However I'm getting this error:
Traceback (most recent call last): File "../DeepTE.py", line 274, in
main() File "../DeepTE.py", line 249, in main pipeline_no_m.classify_pipeline(model_dir, input_CNN_data_file, temp_store_opt_dir, sp_type,te_fam,prop_thr) File "/home/oliveirads/softwares/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 441, in classify_pipeline predict_te(model_file_dic[model_name], model_name, x_LTR_ipt_test_list, y_LTR_ipt_test_nm_list,input_spe_type,y_LTR_ipt_test_nm_list,prop_thr) File "/home/oliveirads/softwares/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 258, in predict_te store_results_dic[str(i)] = str(y_test_nm_list[i]) + '\t' + name_number_dic[model_nm][str(new_predicted_classes_list[i])] KeyError: '3' Could you please share with us how did you use DeepTE with CPU? Currently I'm running it with 32 cores and 160Gb RAM.
— Reply to this email directly, view it on GitHub https://github.com/LiLabAtVT/DeepTE/issues/11#issuecomment-1028237559, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEEENRS3DKZ4BMPSJ7MS5DUZF2M7ANCNFSM5H4NJ2NA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
-- Associate Professor in Plant Genomics and Bioinformatics School of Plant and Environmental Sciences Virginia Polytechnic Institute and State University
Sorry for a late reply @oliveirads-bioinfo @xiekunwhy Regret for not having post my solution at the first time. I think the key was I installed a cpu version tensorflow afterwards. So if you have a hardware problem of your GPU, maybe try to use the CPU version tensorflow. https://anaconda.org/conda-forge/tensorflow-cpu
Hi, It's my first time using a NN framework, I've got the following information when trying to get it run. Thanks a lot in advance for your help.
2021-11-12 17:33:49.258190: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 start time is 1636709630.8717337 Step1: transfer fasta data to CNN input data Step2: classify TEs Step2: 2) domain information is not exist 2021-11-12 17:33:50.937189: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1 2021-11-12 17:33:52.459451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:6d:00.0 name: Tesla T4 computeCapability: 7.5 coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.72GiB deviceMemoryBandwidth: 298.08GiB/s 2021-11-12 17:33:52.459602: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2021-11-12 17:33:52.464361: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2021-11-12 17:33:52.464474: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2021-11-12 17:33:52.465899: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10 2021-11-12 17:33:52.466284: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10 2021-11-12 17:33:52.470599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11 2021-11-12 17:33:52.471428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11 2021-11-12 17:33:52.471598: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2021-11-12 17:33:52.473906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-11-12 17:33:52.475185: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-11-12 17:33:52.489192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:6d:00.0 name: Tesla T4 computeCapability: 7.5 coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 15.72GiB deviceMemoryBandwidth: 298.08GiB/s 2021-11-12 17:33:52.491713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-11-12 17:33:52.491826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "/ngsproject/zhaot/tools/DeepTE/DeepTE.py", line 274, in <module> main() File "/ngsproject/zhaot/tools/DeepTE/DeepTE.py", line 249, in main pipeline_no_m.classify_pipeline(model_dir, input_CNN_data_file, temp_store_opt_dir, sp_type,te_fam,prop_thr) File "/ngsproject/zhaot/tools/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 339, in classify_pipeline predict_te(model_file_dic[model_name], model_name, x_all_test_list, y_all_test_nm_list,input_spe_type,y_all_test_nm_list,prop_thr) File "/ngsproject/zhaot/tools/DeepTE/scripts/DeepTE_pipeline_no_modification.py", line 210, in predict_te model = load_model(model) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/saving/save.py", line 202, in load_model compile) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/saving/hdf5_format.py", line 181, in load_model_from_hdf5 custom_objects=custom_objects) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/saving/model_config.py", line 59, in model_from_config return deserialize(config, custom_objects=custom_objects) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/layers/serialization.py", line 163, in deserialize printable_module_name='layer') File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/utils/generic_utils.py", line 672, in deserialize_keras_object list(custom_objects.items()))) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/sequential.py", line 490, in from_config model = cls(name=name) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/sequential.py", line 110, in __init__ name=name, autocast=False) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 293, in __init__ self._init_batch_counters() File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 301, in _init_batch_counters self._train_counter = tf.Variable(0, dtype='int64', aggregation=agg) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 262, in __call__ return cls._variable_v2_call(*args, **kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 256, in _variable_v2_call shape=shape) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 237, in <lambda> previous_getter = lambda **kws: default_variable_creator_v2(None, **kws) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 2675, in default_variable_creator_v2 shape=shape) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py", line 264, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1595, in __init__ distribute_strategy=distribute_strategy) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1729, in _init_from_args dtype=dtype) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped return func(*args, **kwargs) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1566, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function return constant_op.constant(value, dtype, name=name) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 265, in constant allow_broadcast=True) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 276, in _constant_impl return _constant_eager_impl(ctx, value, dtype, shape, verify_shape) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 301, in _constant_eager_impl t = convert_to_eager_tensor(value, ctx, dtype) File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 97, in convert_to_eager_tensor ctx.ensure_initialized() File "/ngsproject/zhaot/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 525, in ensure_initialized context_handle = pywrap_tfe.TFE_NewContext(opts) tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: initialization error
It seems like something wrong with my GPU. Here's what I have. `(base) [zhaot@alice scripts]$ nvidia-smi Fri Nov 12 17:39:48 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:6D:00.0 Off | Off | | N/A 48C P0 28W / 70W | 0MiB / 16097MiB | 4% Default | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `