Open vijayg78 opened 7 years ago
from scratch
Do you mean with randomly initialised model?
i used the deeplab_resnet_init.ckpt and tried to run the train.py file. The loss was oscillating and not coming down at all. I also tried the deeplab_resnet.ckpt same behaviour.
I used the JPEGimages from VOCdevkit and GTs were pointed to Augmented images i downloaded from this github. Thats correct right?
Same problem for a model which doesn't use the deeplab_resnet.ckpt file to init
what are the images in your tensorboard after few iterations?
i have same problem ,i use my own datatset(3 classes ) to train.Loss value was oscillating and not coming down at all. LOSS 1.2~1.3
@DrSleep there are no images being produced in tensorboard
2all: the hyperparameters (learning rate, batch size, momentum, etc.) have been chosen on Pascal VOC (for the procedure behind these choices, please refer to the original paper). It is not the case that the same hyperparameters would be suitable for other datasets, thus it is your task finding an appropriate set of hyperparameters for your dataset.
This repository is a replication of an academic paper. Anything else besides that is a bonus (like an ability to train on your own datasets).
Okay. But the model is also not working for VOC dataset when not using the pretrained .ckpt file
I also meet this problem, I use VOC2012, and pretrained model..
same here
In my case I used my own data set to do training. At first I took train.py
then the loss went down very very very slowly (from 10 to 8 for 60000 steps), then I took another script train_msc.py
and the loss began to go down very quickly , and I found that the second one did training better than the first since the loss was much smaller (about 3 instead of 8 in my case).
May I know the final loss for after running train.py for 20K iterations with deeplab_resnet_init.ckpt as a start? I used PASCAL dataset and the final loss was about 1.3. It would be better if you could provide the graph of your training curve?
Same here. With the default configuration and PascalVOC the loss oscillates between 1.2-1.3. Could someone plot the training curve or tell which are the loss values after 20K iterations for example? Thanks!
Have someone show the loss after 20K? It is about 1.18 in my PC. Or who knows the reason?
my loss is always about 1.3 and the result predicted the images is black,nothing result.I use default hyperparameters and voc2012 dataset with deeplab_resnet.ckpt as a start.why doesn't work?
my loss is always about 1.3 and the result predicted the images is black,nothing result.I use default hyperparameters and voc2012 dataset with deeplab_resnet.ckpt as a start.why doesn't work?
Hi were you able to solve the issue.
Hi, My loss does not change. It has become stagnant. I have tried everything mentioned related to deeplabv3+ on every blog. I am training to detect roads. My images are of 2000x2000. My training data has 45k images. I have created my image in the form of PASCAL VOC. I have three kinds of pixels. background = [0,0,0] Void class = [255,255,255] road = [1,1,1] so the number of classes = 3 I am using PASCAL VOC pre trained weights.
changes in train_util.py are : 1. ignore_weight = 0 label0_weight =10 label1_weight = 15 not_ignore_mask = tf.to_float(tf.equal(scaled_labels, 1)) * label0_weight
exclude_list = ['global_step','logits'] if not initialize_last_layer: exclude_list.extend(last_layers)
my train.py
nohup python deeplab/train.py \ --logtostderr \ --training_number_of_steps=65000 \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_batch_size=2 \ --initialize_last_layer=False\ --last_layers_contain_logits_only=True\ --dataset="pascal_voc_seg" \ --tf_initial_checkpoint="/data/old_model/models/research/deeplabv3_pascal_trainval/model.ckpt" \ --train_logdir="/data/old_model/models/research/deeplab/mycheckpoints" \ --dataset_dir="/data/models/research/deeplab/datasets/tfrecord" > my_output.log &
Please help 👍 INFO:tensorflow:global step 700: loss = 0.1759 (0.449 sec/step) INFO:tensorflow:global step 710: loss = 0.1695 (0.655 sec/step) INFO:tensorflow:global step 720: loss = 0.1742 (0.689 sec/step) INFO:tensorflow:global step 730: loss = 0.1710 (0.505 sec/step) INFO:tensorflow:global step 740: loss = 0.1708 (0.868 sec/step) INFO:tensorflow:global step 750: loss = 0.1683 (0.632 sec/step) INFO:tensorflow:global step 760: loss = 0.1692 (0.442 sec/step) INFO:tensorflow:global step 770: loss = 0.1693 (0.597 sec/step) INFO:tensorflow:global step 780: loss = 0.1665 (0.441 sec/step) INFO:tensorflow:global step 790: loss = 0.1680 (0.548 sec/step) INFO:tensorflow:global step 800: loss = 0.1708 (0.372 sec/step) INFO:tensorflow:global step 810: loss = 0.1674 (0.327 sec/step) INFO:tensorflow:global step 820: loss = 0.1666 (0.951 sec/step) INFO:tensorflow:global step 830: loss = 0.1651 (0.557 sec/step) INFO:tensorflow:global step 840: loss = 0.1663 (0.506 sec/step) INFO:tensorflow:global step 850: loss = 0.1646 (0.446 sec/step) INFO:tensorflow:global step 860: loss = 0.1666 (0.424 sec/step) INFO:tensorflow:global step 870: loss = 0.1654 (0.520 sec/step) INFO:tensorflow:global step 880: loss = 0.1662 (0.675 sec/step) INFO:tensorflow:global step 890: loss = 0.1673 (0.325 sec/step) INFO:tensorflow:global step 900: loss = 0.1633 (0.548 sec/step) INFO:tensorflow:global step 910: loss = 0.1659 (0.374 sec/step) INFO:tensorflow:global step 920: loss = 0.1639 (0.663 sec/step) INFO:tensorflow:global step 930: loss = 0.1658 (0.442 sec/step) INFO:tensorflow:global step 940: loss = 0.1654 (0.568 sec/step)
@PallawiSinghal Did u find a solution to your problem?
Hi, I started a training from scratch with train.py with VOC2012 data set. I downloaded the Augmented GTs and plugged in to the data set. Now the GTs are the augmented GTs and original jpg files from data set. The loss is not going down, it is oscillating. Any clue on how to get it working? Regards, Vijay