Closed Suvi-dha closed 1 year ago
Update: I solved the error by upgrading pytorch version with cuda 11. But then, I am now getting errors whenever preprocess_input function is called in the Projection_handler.py:
2022-08-23 17:08:43.201210: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "run_ostec.py", line 96, in
Also, if I comment this line( since its giving dummy prediction). I again get error on line 114 of Projection_handler.py. Find the stack trace below:
Traceback (most recent call last):
File "run_ostec.py", line 96, in
Please help with the issue. Is it because of keras applications version mismatch?
Please update your environment installation instructions as they are not compatible with latest NVIDIA drivers. I have to build this repository using NVIDIA-tensorflow container to get it working smoothly.
Try this instruction to install tensorflow.
pip install nvidia-pyindex pip install nvidia-tensorflow[horovod]==1.15.5+nv21.6 pillow requests tqdm pip install nvidia-tensorboard==1.15 protobuf==3.20.1 pip install vtk numpy==1.18.5 menpo menpo3d keras==2.3.0 keras-applications==1.0.8 keras-preprocessing==1.0.5 opencv-python scikit-image
Ubuntu 20.04, CUDA 11.3, CUDNN 8. It's worked.
Try this instruction to install tensorflow.
pip install nvidia-pyindex pip install nvidia-tensorflow[horovod]==1.15.5+nv21.6 pillow requests tqdm pip install nvidia-tensorboard==1.15 protobuf==3.20.1 pip install vtk numpy==1.18.5 menpo menpo3d keras==2.3.0 keras-applications==1.0.8 keras-preprocessing==1.0.5 opencv-python scikit-image
Ubuntu 20.04, CUDA 11.3, CUDNN 8. It's worked.
thanks for your sharing, could you also suggest the torch version?
I follow previous way to install tensorflow on CUDA 11.1, CUDNN 8, but I meet this mistake. can anyone give me some suggestion?
In file included from ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/third_party/eigen3/Eigen/Core:1,
from ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/tensorflow/core/lib/strings/strcat.h:29,
from ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/tensorflow/core/lib/core/errors.h:24,
from /OSTeC/external/stylegan2/dnnlib/tflib/ops/tensorflow/core/framework/op.h:26,
from external/stylegan2/dnnlib/tflib/ops/fused_bias_act.cu:9:
~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/Eigen/Core:28:12: fatal error: cuda/std/complex: No such file or directory
28 | #include <cuda/std/complex>
| ^~~~~~
compilation terminated.
Have you met the stylegan2 error - No GPU founds? because of the name from GPU to XLA GPU
Update: I solved the error by upgrading pytorch version with cuda 11. But then, I am now getting errors whenever preprocess_input function is called in the Projection_handler.py:
2022-08-23 17:08:43.201210: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "run_ostec.py", line 96, in main(args) File "run_ostec.py", line 35, in main operator = Operator(args) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 115, in init self.projector = Projection_Handler(args) File "/home/suvidha/av-project/OSTeC/core/projection_handler.py", line 69, in init self.ff_model.predict(preprocess_input(dummy_im)) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 1462, in predict callbacks=callbacks) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 324, in predict_loop batch_outs = f(ins_batch) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call run_metadata=self.run_metadata) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call run_metadata_ptr) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas SGEMM launch failed : m=3136, n=64, k=64 [[{{node resnet50/res2a_branch2a/convolution}}]] [[reshape_1/Reshape/_5085]] (1) Internal: Blas SGEMM launch failed : m=3136, n=64, k=64 [[{{node resnet50/res2a_branch2a/convolution}}]] 0 successful operations. 0 derived errors ignored.
Also, if I comment this line( since its giving dummy prediction). I again get error on line 114 of Projection_handler.py. Find the stack trace below:
Traceback (most recent call last): File "run_ostec.py", line 96, in main(args) File "run_ostec.py", line 71, in main final_uv, results_dict = operator.run(img, fitting, face_mask) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 416, in run face, results_dict[key] = self.run_iteration(face, key, trg_angle) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 312, in run_iteration face.id_features) File "/home/suvidha/av-project/OSTeC/core/projection_handler.py", line 114, in run_projection preprocess_input(load_images(images_batch, image_size=self.args.resnet_image_size))) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 1456, in predict self._make_predict_function() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 378, in _make_predict_function kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3006, in function v1_variable_initialization() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 431, in v1_variable_initialization [tf.is_variable_initialized(v) for v in candidate_vars]) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 431, in [tf.is_variable_initialized(v) for v in candidate_vars]) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 193, in wrapped return _add_should_use_warning(fn(*args, *kwargs)) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 3083, in is_variable_initialized return state_ops.is_variable_initialized(variable) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 131, in is_variable_initialized return gen_state_ops.is_variable_initialized(ref=ref, name=name) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 284, in is_variable_initialized "IsVariableInitialized", ref=ref, name=name) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3588, in create_op self._check_not_finalized() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3225, in _check_not_finalized raise RuntimeError("Graph is finalized and cannot be modified.") RuntimeError: Graph is finalized and cannot be modified.
Please help with the issue. Is it because of keras applications version mismatch?
Hi, please how did you solve it? I've got the same issue. I use windows and I am unable to install NVIDIA-tensorflow.
I have NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 installed on the server. I followed the instructions to install the repository by installing cuda 10.0, cudnn 7.4 and tensorflow 1.14 inside ostec environment. But after all the models get loaded, I get this error while processing images.
Started: ./test_imgs/0000_0099_00.png Traceback (most recent call last): File "run_ostec.py", line 96, in <module> main(args) File "run_ostec.py", line 66, in main fitting = deep3dmodel.recontruct(im_menpo2PIL(img), lms) File "/home/suvidha/av-project/OSTeC/external/deep3dfacerecon/ostec_api.py", line 68, in recontruct coeffs = self.model.net_recon(self.model.input_img) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "external/deep3dfacerecon/models/networks.py", line 98, in forward x = self.backbone(x) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "external/deep3dfacerecon/models/networks.py", line 375, in forward return self._forward_impl(x) File "external/deep3dfacerecon/models/networks.py", line 358, in _forward_impl x = self.conv1(x) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 423, in forward return self._conv_forward(input, self.weight) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: no kernel image is available for execution on the device
Please help me with the solution. I am stuck and unable to make this repository work since 2 days now. I also can't find cuda path (installed using conda) which I can export in bashrc.