barisgecer / OSTeC

TF implementation of our CVPR 2021 paper: OSTeC: One-Shot Texture Completion
https://openaccess.thecvf.com/content/CVPR2021/html/Gecer_OSTeC_One-Shot_Texture_Completion_CVPR_2021_paper.html
Other
188 stars 28 forks source link

Will the repository work with CUDA 11? #24

Closed Suvi-dha closed 1 year ago

Suvi-dha commented 1 year ago

I have NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 installed on the server. I followed the instructions to install the repository by installing cuda 10.0, cudnn 7.4 and tensorflow 1.14 inside ostec environment. But after all the models get loaded, I get this error while processing images.

Started: ./test_imgs/0000_0099_00.png Traceback (most recent call last): File "run_ostec.py", line 96, in <module> main(args) File "run_ostec.py", line 66, in main fitting = deep3dmodel.recontruct(im_menpo2PIL(img), lms) File "/home/suvidha/av-project/OSTeC/external/deep3dfacerecon/ostec_api.py", line 68, in recontruct coeffs = self.model.net_recon(self.model.input_img) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "external/deep3dfacerecon/models/networks.py", line 98, in forward x = self.backbone(x) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "external/deep3dfacerecon/models/networks.py", line 375, in forward return self._forward_impl(x) File "external/deep3dfacerecon/models/networks.py", line 358, in _forward_impl x = self.conv1(x) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 423, in forward return self._conv_forward(input, self.weight) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: no kernel image is available for execution on the device

Please help me with the solution. I am stuck and unable to make this repository work since 2 days now. I also can't find cuda path (installed using conda) which I can export in bashrc.

Suvi-dha commented 1 year ago

Update: I solved the error by upgrading pytorch version with cuda 11. But then, I am now getting errors whenever preprocess_input function is called in the Projection_handler.py:

2022-08-23 17:08:43.201210: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "run_ostec.py", line 96, in main(args) File "run_ostec.py", line 35, in main operator = Operator(args) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 115, in init self.projector = Projection_Handler(args) File "/home/suvidha/av-project/OSTeC/core/projection_handler.py", line 69, in init self.ff_model.predict(preprocess_input(dummy_im)) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 1462, in predict callbacks=callbacks) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 324, in predict_loop batch_outs = f(ins_batch) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call run_metadata=self.run_metadata) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call run_metadata_ptr) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas SGEMM launch failed : m=3136, n=64, k=64 [[{{node resnet50/res2a_branch2a/convolution}}]] [[reshape_1/Reshape/_5085]] (1) Internal: Blas SGEMM launch failed : m=3136, n=64, k=64 [[{{node resnet50/res2a_branch2a/convolution}}]] 0 successful operations. 0 derived errors ignored.

Also, if I comment this line( since its giving dummy prediction). I again get error on line 114 of Projection_handler.py. Find the stack trace below:

Traceback (most recent call last): File "run_ostec.py", line 96, in main(args) File "run_ostec.py", line 71, in main final_uv, results_dict = operator.run(img, fitting, face_mask) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 416, in run face, results_dict[key] = self.run_iteration(face, key, trg_angle) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 312, in run_iteration face.id_features) File "/home/suvidha/av-project/OSTeC/core/projection_handler.py", line 114, in run_projection preprocess_input(load_images(images_batch, image_size=self.args.resnet_image_size))) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 1456, in predict self._make_predict_function() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 378, in _make_predict_function kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3006, in function v1_variable_initialization() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 431, in v1_variable_initialization [tf.is_variable_initialized(v) for v in candidate_vars]) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 431, in [tf.is_variable_initialized(v) for v in candidate_vars]) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 193, in wrapped return _add_should_use_warning(fn(*args, *kwargs)) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 3083, in is_variable_initialized return state_ops.is_variable_initialized(variable) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 131, in is_variable_initialized return gen_state_ops.is_variable_initialized(ref=ref, name=name) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 284, in is_variable_initialized "IsVariableInitialized", ref=ref, name=name) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3588, in create_op self._check_not_finalized() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3225, in _check_not_finalized raise RuntimeError("Graph is finalized and cannot be modified.") RuntimeError: Graph is finalized and cannot be modified.

Please help with the issue. Is it because of keras applications version mismatch?

Suvi-dha commented 1 year ago

Please update your environment installation instructions as they are not compatible with latest NVIDIA drivers. I have to build this repository using NVIDIA-tensorflow container to get it working smoothly.

BbChip0103 commented 1 year ago

Try this instruction to install tensorflow.


pip install nvidia-pyindex pip install nvidia-tensorflow[horovod]==1.15.5+nv21.6 pillow requests tqdm pip install nvidia-tensorboard==1.15 protobuf==3.20.1 pip install vtk numpy==1.18.5 menpo menpo3d keras==2.3.0 keras-applications==1.0.8 keras-preprocessing==1.0.5 opencv-python scikit-image


Ubuntu 20.04, CUDA 11.3, CUDNN 8. It's worked.

ZhenyanSun commented 1 year ago

Try this instruction to install tensorflow.

pip install nvidia-pyindex pip install nvidia-tensorflow[horovod]==1.15.5+nv21.6 pillow requests tqdm pip install nvidia-tensorboard==1.15 protobuf==3.20.1 pip install vtk numpy==1.18.5 menpo menpo3d keras==2.3.0 keras-applications==1.0.8 keras-preprocessing==1.0.5 opencv-python scikit-image

Ubuntu 20.04, CUDA 11.3, CUDNN 8. It's worked.

thanks for your sharing, could you also suggest the torch version?

alicedingyueming commented 7 months ago

I follow previous way to install tensorflow on CUDA 11.1, CUDNN 8, but I meet this mistake. can anyone give me some suggestion?

In file included from ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/third_party/eigen3/Eigen/Core:1, from ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/tensorflow/core/lib/strings/strcat.h:29, from ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/tensorflow/core/lib/core/errors.h:24, from /OSTeC/external/stylegan2/dnnlib/tflib/ops/tensorflow/core/framework/op.h:26, from external/stylegan2/dnnlib/tflib/ops/fused_bias_act.cu:9: ~/.conda/envs/ostecP/lib/python3.6/site-packages/tensorflow_core/include/Eigen/Core:28:12: fatal error: cuda/std/complex: No such file or directory 28 | #include <cuda/std/complex> | ^~~~~~ compilation terminated.

wengjincheng commented 7 months ago

Have you met the stylegan2 error - No GPU founds? because of the name from GPU to XLA GPU

yug125lk commented 4 months ago

Update: I solved the error by upgrading pytorch version with cuda 11. But then, I am now getting errors whenever preprocess_input function is called in the Projection_handler.py:

2022-08-23 17:08:43.201210: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "run_ostec.py", line 96, in main(args) File "run_ostec.py", line 35, in main operator = Operator(args) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 115, in init self.projector = Projection_Handler(args) File "/home/suvidha/av-project/OSTeC/core/projection_handler.py", line 69, in init self.ff_model.predict(preprocess_input(dummy_im)) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 1462, in predict callbacks=callbacks) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 324, in predict_loop batch_outs = f(ins_batch) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call run_metadata=self.run_metadata) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call run_metadata_ptr) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas SGEMM launch failed : m=3136, n=64, k=64 [[{{node resnet50/res2a_branch2a/convolution}}]] [[reshape_1/Reshape/_5085]] (1) Internal: Blas SGEMM launch failed : m=3136, n=64, k=64 [[{{node resnet50/res2a_branch2a/convolution}}]] 0 successful operations. 0 derived errors ignored.

Also, if I comment this line( since its giving dummy prediction). I again get error on line 114 of Projection_handler.py. Find the stack trace below:

Traceback (most recent call last): File "run_ostec.py", line 96, in main(args) File "run_ostec.py", line 71, in main final_uv, results_dict = operator.run(img, fitting, face_mask) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 416, in run face, results_dict[key] = self.run_iteration(face, key, trg_angle) File "/home/suvidha/av-project/OSTeC/core/operator.py", line 312, in run_iteration face.id_features) File "/home/suvidha/av-project/OSTeC/core/projection_handler.py", line 114, in run_projection preprocess_input(load_images(images_batch, image_size=self.args.resnet_image_size))) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 1456, in predict self._make_predict_function() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/engine/training.py", line 378, in _make_predict_function kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3006, in function v1_variable_initialization() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 431, in v1_variable_initialization [tf.is_variable_initialized(v) for v in candidate_vars]) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 431, in [tf.is_variable_initialized(v) for v in candidate_vars]) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 193, in wrapped return _add_should_use_warning(fn(*args, *kwargs)) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 3083, in is_variable_initialized return state_ops.is_variable_initialized(variable) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 131, in is_variable_initialized return gen_state_ops.is_variable_initialized(ref=ref, name=name) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 284, in is_variable_initialized "IsVariableInitialized", ref=ref, name=name) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, kwargs) File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3588, in create_op self._check_not_finalized() File "/home/suvidha/anaconda3/envs/ostec/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3225, in _check_not_finalized raise RuntimeError("Graph is finalized and cannot be modified.") RuntimeError: Graph is finalized and cannot be modified.

Please help with the issue. Is it because of keras applications version mismatch?

Hi, please how did you solve it? I've got the same issue. I use windows and I am unable to install NVIDIA-tensorflow.