Using vector.py show_image causes unitialized value Generator.input

peterkentish commented 5 years ago

Hi,

I am trying to run vector.py to do a whole host of latent space exploration for a dissertation project.

I have been trying to get my head around exactly how to pass the correct arguments, and have eventually reached this:

The last few lines of logo_wgan.py:

    with  tf.Session(config=tf.ConfigProto(log_device_placement=True,allow_soft_placement=True)) as session:
    if args.load_config is not None:
        print ("creating WGAN")
        wgan = WGAN(session, load_config=args.load_config)
        vec = Vector(wgan)
        vec.show_random(save=True)
    else:
        wgan = WGAN(session, config_dict=arg_dict, train=args.train)
    if args.train:
        wgan.train()

The command being run:

python logo_wgan.py --load_config 'settings'

Ive had to put the config.json file in runs/settings/config.json as the logo_wgan.py prefixes runs/ to the load_config parameter - this is probably my own lack of understanding.

I also had to change line 63 in vector.py to:

    if self.cfg.N_LABELS > 0:
            # with h5py.File(self.cfg.DATA) as hf:
                # probs = hf[self.cfg.LABELS].attrs['probs']
            # number = np.random.choice(range(self.cfg.N_LABELS), size=size, replace=True, p=probs)
            number = np.random.randint(0, self.cfg.N_LABELS, size)

As otherwise i got an error in finding the HDF5 data:

    Traceback (most recent call last):
  File "logo_wgan.py", line 657, in <module>
    vec.show_random(save=True)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/vector.py", line 41, in show_random
    y = self.gen_y(size=size)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/vector.py", line 62, in gen_y
    probs = hf[self.cfg.LABELS].attrs['probs']
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1496871545397/work/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1496871545397/work/h5py/_objects.c:2804)
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/h5py/_hl/group.py", line 169, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1496871545397/work/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1496871545397/work/h5py/_objects.c:2804)
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/home/ilan/minonda/conda-bld/h5py_1496871545397/work/h5py/h5o.c:3740)
KeyError: 'Unable to open object (Component not found)'

With these changes i get the code to run, and after some time i get this:

  File "logo_wgan.py", line 657, in <module>
    vec.show_random(save=True)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/vector.py", line 42, in show_random
    self.show_z(z, y, shape=shape, border=border, enum=enum, res=res, save=save)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/vector.py", line 142, in show_z
    self.show(self.sample_z(z, y), shape=shape, enum=enum, border=border, res=res, save=save)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/vector.py", line 100, in sample_z
    samples = self.wgan.sample(z_i, y_i)
  File "logo_wgan.py", line 265, in sample
    self._init_sampler()
  File "logo_wgan.py", line 250, in _init_sampler
    self.sampler = self.Generator(self.cfg, n_samples=0, labels=self.y, noise=self.z, is_training=self.t_train)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/tflib/architectures.py", line 23, in Generator_Resnet_32
    output = lib.ops.linear.Linear('Generator.Input', 128 + add_dim, 4 * 4 * cfg.DIM_G, noise)
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/tflib/ops/linear.py", line 110, in Linear
    weight_values
  File "/data/aca15pk/College/LLD-icon-sharp_rc_128/tflib/__init__.py", line 25, in param
    param = tf.Variable(*args, **kwargs)
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 199, in __init__
    expected_shape=expected_shape)
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 330, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1400, in identity
    result = _op_def_lib.apply_op("Identity", input=input, name=name)
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/aca15pk/.conda/envs/py2-gpu/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Generator.Input/Generator.Input.W
     [[Node: Generator.Input/Generator.Input.W/read = Identity[T=DT_FLOAT, _class=["loc:@Generator.Input/Generator.Input.W"], _device="/job:localhost/replica:0/task:0/gpu:0"](Generator.Input/Generator.Input.W)]]

This is the error I don't understand.

I appreciate i have put a lot of information, just trying to be as clear as possible. If there is a more correct way (there probably is) to use vector.py I would love to know. If i work it out, I'd be happy to write a readme for it to help out.

alex-sage commented 5 years ago

Hi Peter, sorry for the late response. I'm currently very short on time due to some personal circumstances (just moved house). However, since I invested quite a bit of work into this, I'd be happy if someone else could use it of course and would like to work with you to resolve your issues. Are you still interested in pursuing this?

First off, the "load config" parameter is meant to load a previously run configuration, not for using it when starting a new run completely from scratch. That doesn't mean you can't, I't just not exactly what I intended. So when you start a training run, you give it a name, e.g. experiment_1, under which all the configs and checkpoints will be stored as runs/experiment_1. Then you can load it again, e.g. to continue training or to do some inference experiments, by using python logo_wgan.py --load_config 'experiment_1'.

As for the line you commented out, is meant to ensure that the probability distribution of randomly generated labels matches the one of the training data. The reason you're getting this error, is that the probabilities are missing in the training data set. I think what happend there was that I was planning to add them but in the end decided not to due to a lack of time, which is also why there are similar lines commented out in logo_wgan.py. If you need a matching distribution, which can be important for certain applications, all you need to do is count the labels in the training data and calculate the probability for each label, so it's actually really easy. Then insert these numbers into the 'probs' attribute.

The last error you mention doesn't ring a bell right away, I'd have to look into that one a bit more closely. I'm happy to do that in case you're still interested.

peterkentish commented 5 years ago

Hi Alex!

I am really pleased you are still available. Congrats on the move! I am still working on this, however i closed this particular issue as i managed to figure out various aspects of it. For prosperity and for if anyone else stumbles across this, I will try and address my issues here.

The first issue to do with the load config was simply that the--load_config param should be equal to the RUN_NAME parameter as this is where the config.json is written to.

The second problem is as you say, although i hadn't figured that out.

The final issue is that from what I can tell is this problem. The pre trained models are missing data that it expects to find there. When using a model i trained myself, this was no issue.

alex-sage commented 5 years ago

Glad to hear that you've got it to work!

If the last issue is indeed related to batchnorm, it might be a tensorflow version conflict. They changed the implementation at some point without being backwards compatible, and if you have a model which was trained using the old version and try to load it with the newer one, it fails with exactly the error mentioned on stackoverflow if I recall correctly. Batchnorm has been very painful to use for me overall, but I think in newer versions of tensorflow it should work better (probably not compatible again though). That being said, my pretrained models shouldn't use the old version of batchnorm and the error code you pasted above sounds like a different problem (?)

By the way, if you at all still feel like writing a short readme about how to use vector.py (and / or anyhting else in this repo) I'd be very grateful and will help you the best I can. This is something I wanted to do for about a year now and never found the time to actually do it. In fact, I now thought it was probably not worth it anymore since my code lost its relevance, but it seems this is not entirely true :)

peterkentish commented 5 years ago

Ah, I did try each of the available versions of tensorflow that were available in the conda environment I had created, however older versions did indeed have a batchnorm issue. Something to do with the fused parameter.

I would still love to help, when I get to a good level of understanding I'll write up some documentation.

With regard to relevance, fear not! My project is working in conjunction with a creative AI based company, who are interested in scoping out the possibility of using GANs to create brand imagery. Right up this repos street!

My end goal is to create a GUI similar to the one mentioned in your paper, perhaps with a few extra features, that demonstrates the effectiveness of this technique of logo generation. I feel there are few people who could help me as much as you yourself can :)

alex-sage commented 5 years ago

OK, so I just successfully got the WGAN code to run in a virtualenv on my work PC. Here's how I use it.

First of all, these are the packages I used. Most importantly, use tensorflow-gpu 1.3.0 to be sure it works. Most probably the errors you encountered stem from an incompatible tensorflow version.

backports.functools-lru-cache==1.5
backports.shutil-get-terminal-size==1.0.0
backports.weakref==1.0rc1
bleach==1.5.0
cycler==0.10.0
decorator==4.3.2
enum34==1.1.6
funcsigs==1.0.2
h5py==2.9.0
html5lib==0.9999999
ipython==5.8.0
ipython-genutils==0.2.0
kiwisolver==1.0.1
Markdown==3.0.1
matplotlib==2.2.3
mock==2.0.0
numpy==1.16.1
pathlib2==2.3.3
pbr==5.1.2
pexpect==4.6.0
pickleshare==0.7.5
Pillow==5.4.1
pkg-resources==0.0.0
prompt-toolkit==1.0.15
protobuf==3.6.1
ptyprocess==0.6.0
Pygments==2.3.1
pyparsing==2.3.1
python-dateutil==2.8.0
pytz==2018.9
scandir==1.9.0
scipy==1.2.1
simplegeneric==0.8.1
six==1.12.0
subprocess32==3.5.3
tensorflow==1.3.0
tensorflow-gpu==1.3.0
tensorflow-tensorboard==0.1.8
tqdm==4.31.1
traitlets==4.3.2
wcwidth==0.1.7
Werkzeug==0.14.1

Once you have all the required packages, download the LLD-logo-sharp dataset from here and copy it to data/LLD-icon-sharp.hdf5. Then download WGAN - LLD-icon-sharp with 128 RC clusters and unzip the contained folder to wgan/runs/LLD-icon-sharp_rc_128.

There are 2 small changes you have to make in order for it to run:

Change line 19 in wgan/tflib/inception_score.py to a valid path in your system
in wgan/runs/LLD-icon-sharp_rc_128/config.json, change "DATA": "LLD-icon-sharp.hdf5" to "DATA": "data/LLD-icon-sharp.hdf5". (Doesn't have to be changed when you keep the hdf5 file in the root directory, but I think this is cleaner)

I left the lines you mentioned untouched and it worked for me, so the probability values seem to be there as expected. Possibly this probability data was just missing for one specific dataset. Which one where you working with?

To use my vector class, I find it best to work in ipython. If you start an ipython session in the wgan directory, you should be able to run the following commands to generate a random batch of logos:

import tensorflow as tf
import numpy as np
import vector
from logo_wgan import WGAN

session = tf.Session()
wgan = WGAN(session, load_config='LLD-icon-sharp_rc_128')
vec = vector.Vector(wgan)
vec.show_random()

If this works, you can use the vec objet to play around in the latent space of the GAN by generating latent vectors and labels, interpolating between them ect. Let me know if you need any help with that and/or don't understand the purpose of some function.

Hope this helps for a start. I really have to apologize for the complete lack of documentation, I can see how it's not at all clear how this stuff is supposed to be used.

Last but not least, I'm really excited about your project! I was thinking about creating a small GUI for my code too (which wouldn't have been a lot more work at this stage, since I basically have a functioning API already) but unfortunately never had the time to do it. Or even write a proper documentation for what I do have for that matter. So again, happy to help wherever I can. I could also share some additional script files I used for image generation, if that would help you get an idea of how to use the classes. You can also email me directly at [removed], or we can keep the discussion here on github.

peterkentish commented 5 years ago

Ah perfect, that seemed to have saved the day.

There is a small bug on line 138 of vector.py which causes this error:

Traceback (most recent call last):
  File "makeImages.py", line 9, in <module>
vec.show_random(save=True)
  File "/data/aca15pk/College/take5/logo-gen-master/wgan/vector.py", line 42, in show_random
self.show_z(z, y, shape=shape, border=border, enum=enum, res=res, save=save)
  File "/data/aca15pk/College/take5/logo-gen-master/wgan/vector.py", line 142, in show_z
self.show(self.sample_z(z, y), shape=shape, enum=enum, border=border, res=res, save=save)
  File "/data/aca15pk/College/take5/logo-gen-master/wgan/vector.py", line 138, in show
with open(save, 'wb') as f:
TypeError: coercing to Unicode: need string or buffer, bool found

Which is caused by the save boolean being used as a filename. Other than that, works a treat! I discovered this as i sadly have to use a high performance computer with no GUI (so no iPython).

Already this issue thread explains exactly how to load up a model, which I'm sure many people will find useful.

The extra script files sound great, anything you can send to help me along is very appreciated.

From now on I will try and use the issue system more appropriately and close this thread, but I will raise questions as issues so more people can benefit from the discoveries.

alex-sage commented 5 years ago

Ah yes, this definitely not the best coding style and again, the problem is a lack of documentation. vec.show_random(save=True) doesn't work. The save parameter, if not False, takes a string containing the file path where the image should be saved. Probably should be changed to None as a default value. I just found it nicer to be able to say save=False when I don't want to save, but in retrospect that's pretty stupid...

peterkentish commented 5 years ago

vex.show_random(save=True) did seem to work with the change to a string. You're right though, could do with a better overall reformat than my change

alex-sage commented 5 years ago

The idea was to use it as vec.show_random(save = '/home/me/folder/image.png') This way you can also do things like

for i in range(128)
    z = vec.gen_z()
    y = vec.gen_y(number=i)
    vec.show_z(z=z, y=y, save='path/to/file/class_'+str(i).zfill(3)+'.png')

to save a batch for each of the dataset classes (labels).

alex-sage commented 5 years ago

I just pushed a small change and added documentation to this particular function

alex-sage / logo-gen

Using vector.py show_image causes unitialized value Generator.input #8