Open akanksh2kb opened 3 years ago
Training Not proceeding after this: Trigger callback: Total counts of trainable weights: 9999294. Total size of trainable weights: 0G 9M 548K 958B (Assuming32-bit data type.) 2021-01-19 14:29:35.994320: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
Data folder hierarchy: data-->peak--> ['flist.sh' gen_flist.py 'train_shuffled.flist' 'training_data' 'validation_shuffled.flist']
training_data --> ['training' 'validation']
I am facing the same issue too. Please help us?
same issue tf-gpu1.6.0
- weight name: discriminator/sn_patch_gan/conv6/kernel:0, shape: [5, 5, 256, 256], size: 1638400
- weight name: discriminator/sn_patch_gan/conv6/bias:0, shape: [256], size: 256
Trigger callback: Total counts of trainable weights: 9999294.
Total size of trainable weights: 0G 9M 548K 958B (Assuming32-bit data type.)
If you are stuck after the [Trigger callback: Total counts of trainable weights: 9999294. Total size of trainable weights: 0G 9M 548K 958B (Assuming32-bit data type.)] without receiving extra error messages, it is probably working well without outputting anything.
Please check the parameters inside "inpaint.yml" file
train_spe: 4000.
val_psteps: 2000
train_spe controls how often the checkpoint is saved.
val_psteps controls how often the tensorboard records.
If you are training on only one GPU, then setting train_spe to 4000 and val_psteps to 2000 takes really long time before you can see any output information. In my case, it took 2 hours to 4000 train_spe on my 1080Ti.
So maybe you should set as follows to see what happens:
train_spe: 4
val_psteps: 10
It works for me! GOOD LUCK!
I started after training.Traceback(most recent call last) error is reported long after CUDA has successfully loaded.What should I do
Training Not proceeding after this: Trigger callback: Total counts of trainable weights: 9999294. Total size of trainable weights: 0G 9M 548K 958B (Assuming32-bit data type.) 2021-01-19 14:29:35.994320: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
I also encountered it, did you solve it?
same issue tf-gpu1.6.0
- weight name: discriminator/sn_patch_gan/conv6/kernel:0, shape: [5, 5, 256, 256], size: 1638400 - weight name: discriminator/sn_patch_gan/conv6/bias:0, shape: [256], size: 256 Trigger callback: Total counts of trainable weights: 9999294. Total size of trainable weights: 0G 9M 548K 958B (Assuming32-bit data type.)
I also encountered it,did you solve it?please help me
It takes 2000 steps to save the summary, so please be patient, Or you can open the log to see the training process.
Can someone help me with training? I need to know folder hierarchy of dataset. And should there be masks in one folder?
While training I am not getting any error but training not at all happening
Tried giving input images as 256X 256
If I will know the steps to train, it will be really helpful, as I am stuck ################################ I edited inpaint.yml file for my data :
=========================== Basic Settings ===========================
machine info
num_gpus_per_job: 1 # number of gpus each job need num_cpus_per_job: 4 # number of gpus each job need num_hosts_per_job: 1 memory_per_job: 32 # number of gpus each job need gpu_type: 'nvidia-tesla-p100'
parameters
name: places2_gated_conv_v100 # any name model_restore: '' # logs/places2_gated_conv dataset: 'peak' # 'tmnist', 'dtd', 'places2', 'celeba', 'imagenet', 'cityscapes' random_crop: False # Set to false when dataset is 'celebahq', meaning only resize the images to img_shapes, instead of crop img_shapes from a larger raw image. This is useful when you train on images with different resolutions like places2. In these cases, please set random_crop to true. val: False # true if you want to view validation results in tensorboard log_dir: logs/full_model_celeba_hq_256
gan: 'sngan' gan_loss_alpha: 1 gan_with_mask: True discounted_mask: True random_seed: False padding: 'SAME'
training
train_spe: 4000 max_iters: 100000000 viz_max_out: 10 val_psteps: 2000
data
data_flist:
https://github.com/jiahuiyu/progressive_growing_of_gans_tf
celebahq: [ 'data/celeba_hq/train_shuffled.flist', 'data/celeba_hq/validation_static_view.flist' ]
http://mmlab.ie.cuhk.edu.hk/projects/celeba.html, please to use random_crop: True
celeba: [ 'data/celeba/train_shuffled.flist', 'data/celeba/validation_static_view.flist' ]
http://places2.csail.mit.edu/, please download the high-resolution dataset and use random_crop: True
places2: [ 'data/places2/train_shuffled.flist', 'data/places2/validation_static_view.flist' ]
http://www.image-net.org/, please use random_crop: True
imagenet: [ 'data/imagenet/train_shuffled.flist', 'data/imagenet/validation_static_view.flist', ] peak: [ 'data/peak/train_shuffled.flist', 'data/peak/validation_shuffled.flist', ]
static_view_size: 30 img_shapes: [256, 256, 3] height: 128 width: 128 max_delta_height: 32 max_delta_width: 32 batch_size: 16 vertical_margin: 0 horizontal_margin: 0
loss
ae_loss: True l1_loss: True l1_loss_alpha: 1.
to tune
guided: False edge_threshold: 0.6 #################################
Thanks, akanksh