Closed DateBro closed 4 years ago
Hi DateBro, Thanks for your attention.
If there are any other problems, please feel free to reopen this repo.
I tested the commands in https://github.com/csmliu/STGAN/blob/master/att_classification/README.md, but got some trouble in tackling it. When using tensorflow-gpu1.15, I got errors like
(0) Failed precondition: Attempting to use uninitialized value classifier/Conv_2/weights
[[node classifier/Conv_2/weights/read (defined at /home/zhiyong/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[Cast/_3]]
(1) Failed precondition: Attempting to use uninitialized value classifier/Conv_2/weights
[[node classifier/Conv_2/weights/read (defined at /home/zhiyong/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
When using a new virtual environment of tf-gpu1.12 or 1.4, I always got
WARNING:tensorflow:From /home/zhiyong/RemoteServer/pycharm_projects/STGAN/att_classification/tflib/collection.py:62: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
段错误 (核心已转储
I have no idea what to do, can you give me some advice?
TF 1.12 should work according to my experience, and should not raise such warnings.
Please check that whether you are using the right version by import tensorflow as tf; print(tf.__version__)
I remember that, the warning about TF 2.0 occurs in TF 1.13 or 1.14, so maybe you are using the wrong version.
Sorry, I forgot some features of Anaconda and use TF 1.15 as TF 1.12. But when I used TF 12 correctly, there is still the same error as TF 1.15.
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value classifier/Conv/weights
[[Node: classifier/Conv/weights/read = Identity[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](classifier/Conv/weights)]]
[[Node: Cast/_3 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_228_Cast", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Pretty weird error. Have you modified the network architecture?
BTW, to make sure that the environment is correctly set, it's better to create the virtual environment by
conda create -n NAME tensorflow-gpu=1.12
That is, assigning the tf version when creating the environment (in this way, anaconda will automatically install packages which will not cause conflicts, e.g., an older version of python.)
I only modified basic.py
return imageio.imread(path, pilmode=mode) / 127.5 - 1
and set the test_tfrecord_path = './tfrecords/test'
to absolute path, because when I use relative path I got no such file or directory.
I run the test.py on Ubuntu18.04 with RTX2070, should I switch to windows? It seems that your train.py was run in Windows. I am still confused about the advice in stackoverflow, shouldn't test.py just read in the checkpoint file and predict?
Thanks for your detailed help, I can run the test after add the code in StackOverflow. 👍
I got different accuracy by the command
python test.py --experiment_name 128 --test_int 2 --dataroot mydataroot
python att_classification/test.py --img_dir ./output/128/sample_testing
First test:
Acc.
[0.64472498 0.84395351 0.64632802 0.13325318 0.17969141 0.12954614
0.93497646 0.39469993 0.50490933 0.90997896 0.16245867 0.93377417
0.35196874]
Second test:
Acc.
[0.02609959 0.40992886 0.27326921 0.13325318 0.18154494 0.20203386
0.92084961 0.48176535 0.49509067 0.39339746 0.76605551 0.04207995
0.75288047]
Is there something I forget to do? The results are so weird.
Did you run the two commands twice or only run the second command twice?
I only repeat the second command several times and find the results are different from each other and your quantitative.results in #18 .
Well, maybe I have found the problem. Please use the original code, and test via
cd att_classification
python test.py --img_dir ../output/128/sample_testing
emmm The results still have the same problem.
Could you download the repo and try again using the original att_classification
folder?
I've just tested the code with Ubuntu 18.04 and TensorFlow 1.12.
After cloning the repo and try again, I got the correct accuracy as quantitative.results. In the last repo, I only add the init code following StackOverflow to make it run. Anyway, thanks for your help and I'll figure out what's the problem in my machine or the modified code.
Hi, csmliu, thank you so much for your brilliant work! I want to use STGAN as a baseline in my paper and want to get the attribute classification accuracy of STGAN on some attributes instead of your predefined attributes. But I find it confusing about the required tfrecord data, which is a little different from LynnHo/TfrecordCreator. Just want to avoid potential error, could you give a more detailed tutorial for the training of attribute classifier?