dorarad / gansformer

Generative Adversarial Transformers
MIT License
1.32k stars 149 forks source link

Errors when running generate.py #8

Closed athena913 closed 3 years ago

athena913 commented 3 years ago

Hi,

Thank you very much for your interesting work and for making your code public.

I ran generate.py but it resulted in errors related to dnnlib as shown below. I am using TF 1.14, cuda 10.0, cudnn 7.6.1, gcc 7.5.0. Here are the things I have tried: 1) I looked at your response to issue #5 and verified that nvcc test_nvcc.cu generates the expected response (Hello CPU and Hello GPU) 2) Also I ran the fused_bias_act cmd you have mentioned in issue #5 and it completes without errors and generates several fused_bias_act* files. 3) I have also run the original TF stylegan2 code (training and generation) on the same machine with the same version of TF and cudnn without any issues.

I would appreciate any help you can provide that can help fix the errors below.

Thank you

python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 32

Loading networks... Setting up TensorFlow plugin 'upfirdn_2d.cu': Preprocessing... Compiling... Loading... Failed! Traceback (most recent call last): File "generate.py", line 49, in main() File "generate.py", line 46, in main run(vars(args)) File "generate.py", line 22, in run G, D, Gs = load_networks(model) # Load pre-trained network File "/external_code/gan/gansformer/pretrained_networks.py", line 30, in load_networks G, D, Gs = pickle.load(stream, encoding = "latin1")[:3] File "/external_code/gan/gansformer/dnnlib/tflib/network.py", line 306, in setstate self._init_graph() File "/external_code/gan/gansformer/dnnlib/tflib/network.py", line 159, in _init_graph out_expr = self._build_func(*self.input_templates, *build_kwargs) File "", line 2371, in G_synthesis_stylegan2 File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 229, in downsample_2d return _simple_upfirdn_2d(x, k, down=factor, pad0=(p+1)//2, pad1=p//2, data_format=data_format, impl=impl) File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 358, in _simple_upfirdn_2d y = upfirdn_2d(y, k, upx=up, upy=up, downx=down, downy=down, padx0=pad0, padx1=pad1, pady0=pad0, pady1=pad1, impl=impl) File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 61, in upfirdn_2d return impl_dict[impl](x=x, k=k, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1) File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 139, in _upfirdn_2d_cuda return func(x) File "/platforms/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 162, in decorated return _graph_mode_decorator(f, args, kwargs) File "/platforms/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 183, in _graph_mode_decorator result, grad_fn = f(*args) File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 131, in func y = _get_plugin().up_fir_dn2d(x=x, k=kc, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1) File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 14, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/external_code/gan/gansformer/dnnlib/tflib/custom_ops.py", line 156, in get_plugin plugin = tf.load_op_library(bin_file) File "/platforms/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /external_code/gan/gansformer/dnnlib/tflib/_cudacache/upfirdn_2d1.14.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

dorarad commented 3 years ago

Hi, Thanks for reaching out! seems like you have a similar problem to #7.

I recommend in the following line: https://github.com/dorarad/gansformer/blob/main/dnnlib/tflib/custom_ops.py#L130 try changing int(tf_ver < 1.15) to 0.

Then you should clean the custom ops built so that you can retry: rm -rf /external_code/gan/gansformer/dnnlib/tflib/cudacache/ and then try to run the code again. Let me know if you keep experiencing that issue after that!

athena913 commented 3 years ago

Hi, Your solution resolved the problem. (Sorry I overlooked issue #7 since I was not using docker - I should have checked the content instead of the subject line.) Thank you very much for your help.

dorarad commented 3 years ago

No worries at all glad to hear at got resolved!