Some Errors On Training

ghost commented 3 years ago

Thank you for your great work. I appreciate it a lot.

I just tried to train a model with your codes, however there are lots of undefined variables used. For example:

https://github.com/dorarad/gansformer/blob/148f72964219f8ead2621204bc5cfa89200b6879/training/network.py#L795

It throw out undefined variable error for 'maps_in'. When I fix that with a constant, I get another error from

https://github.com/dorarad/gansformer/blob/148f72964219f8ead2621204bc5cfa89200b6879/training/network.py#L811

again gen_mod and gen_cond are not defined. When I fix that with a constant again, I get another error which says:

gansformer-main/gansformer-main/training/network.py", line 1127, in G_synthesis grid_poses = get_positional_embeddings(resolution_log2, pos_dim or dlatent_size, pos_type, pos_directions_num, init = pos_init, **_kwargs) TypeError: get_positional_embeddings() got an unexpected keyword argument 'label_size'

Am i missing something or is there a problem?

dorarad commented 3 years ago

Hi! Code not fully ready yet I started refactoring it a few days ago but still need to finish. Will complete it asap stay tuned!

dimitri-voytan commented 3 years ago

Hi, will you include a requirements.txt?

dorarad commented 3 years ago

Of course, as well as full instructions for preparing the data files, training and everything. Actively working on it and will put it all online soon!

ghost commented 3 years ago

https://github.com/dorarad/gansformer/blob/327bbcd125e8d38622393e4533f309eae6799135/training/network.py#L296

Thanks for clean up, but still there is issue about this. tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Incompatible shapes: [4,32] vs. [4,16,32] [[{{node GPU0/G_loss/G/G_synthesis/4x4/Conv/add_2}}]] [[GPU0/Lossg_grad/global_norm/global_norm/_6897]] (1) Invalid argument: Incompatible shapes: [4,32] vs. [4,16,32] [[{{node GPU0/G_loss/G/G_synthesis/4x4/Conv/add_2}}]]

x_ and x have different shapes.

And 1 more,

https://github.com/dorarad/gansformer/blob/327bbcd125e8d38622393e4533f309eae6799135/training/network.py#L1642

I think there is need to include 'aggregators' itself as parameter.

dorarad commented 3 years ago

Hi @yilmazkorkmz thanks for pointing that out! I will make sure to look into the errors that you get. I tested the code locally and it worked for me with the settings that I used for the new model experiments. I'm working now on making the instructions for training and will announce it as soon as all things are ready and complete!

(Also would like to add the the original code worked originally with all options but then when I started refactoring to improve readability it introduced some easy-to-catch bugs that I'm working now to resolve those - I'll make sure to test the code throughly under variety of command line options)

ghost commented 3 years ago

Thanks for last update, I tried to run with these options and dataset: --train --vis-images --vis-grid --transformer --g-img2ltnt --components-num=16 --vis-maps --kmeans But I got this error:

File "/auto/k2/korkmaz/Downloads/gansformer_last/gansformer-main/training/training_loop.py", line 256, in training_loop set_optimizer_ops(cN, lazy_regularization, no_op) File "/auto/k2/korkmaz/Downloads/gansformer_last/gansformer-main/training/training_loop.py", line 100, in set_optimizer_ops cN.opt.register_gradients(tf.reduce_mean(cN.loss), cN.trainables) File "/auto/k2/korkmaz/Downloads/gansformer_last/gansformer-main/dnnlib/tflib/optimizer.py", line 150, in register_gradients grad_list = [(tf.where(is_bad(grad), tf.zeros_like(grad), grad), varname) for grad, varname in grad_list] File "/auto/k2/korkmaz/Downloads/gansformer_last/gansformer-main/dnnlib/tflib/optimizer.py", line 150, in grad_list = [(tf.where(is_bad(grad), tf.zeros_like(grad), grad), varname) for grad, varname in grad_list] File "/auto/k2/korkmaz/Downloads/gansformer_last/gansformer-main/dnnlib/tflib/optimizer.py", line 149, in is_bad def is_bad(grad): return tf.logical_or(tf.math.is_nan(grad), tf.math.is_inf(grad)) File "/auto/k2/korkmaz/anaconda3/envs/myenv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 5298, in is_nan "IsNan", x=x, name=name) File "/auto/k2/korkmaz/anaconda3/envs/myenv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 546, in _apply_op_helper (input_name, err)) ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

dorarad commented 3 years ago

Hi @yilmazkorkmz, thanks for letting me know! I'm still testing and finalizing the repository right now. Please wait for me to announce the instructions after I complete everything -- I'd like to assure you that I'll follow up on this thread and let you know as soon as things are ready!

dorarad commented 3 years ago

Hi , @yilmazkorkmz and @dimitri-voytan alright so all code should be working now! Please let me know if you face any issues, hopefully there won't be anymore. I appreciate your patience and help in reporting past issues!

ghost commented 3 years ago

Hi, thank you for your efforts. Codes seem working with my configurations, i will let you know if there is any problem.

dorarad commented 3 years ago

Wonderful!

dorarad / gansformer

Some Errors On Training #1