DrSleep / tensorflow-deeplab-resnet

DeepLab-ResNet rebuilt in TensorFlow
MIT License
1.25k stars 429 forks source link

Loss not going down #96

Open vijayg78 opened 7 years ago

vijayg78 commented 7 years ago

Hi, I started a training from scratch with train.py with VOC2012 data set. I downloaded the Augmented GTs and plugged in to the data set. Now the GTs are the augmented GTs and original jpg files from data set. The loss is not going down, it is oscillating. Any clue on how to get it working? Regards, Vijay

DrSleep commented 7 years ago

from scratch

Do you mean with randomly initialised model?

vijayg78 commented 7 years ago

i used the deeplab_resnet_init.ckpt and tried to run the train.py file. The loss was oscillating and not coming down at all. I also tried the deeplab_resnet.ckpt same behaviour.

vijayg78 commented 7 years ago

I used the JPEGimages from VOCdevkit and GTs were pointed to Augmented images i downloaded from this github. Thats correct right?

akshittyagi commented 7 years ago

Same problem for a model which doesn't use the deeplab_resnet.ckpt file to init

DrSleep commented 7 years ago

what are the images in your tensorboard after few iterations?

Hjy20255 commented 7 years ago

i have same problem ,i use my own datatset(3 classes ) to train.Loss value was oscillating and not coming down at all. LOSS 1.2~1.3

akshittyagi commented 7 years ago

@DrSleep there are no images being produced in tensorboard

DrSleep commented 7 years ago

2all: the hyperparameters (learning rate, batch size, momentum, etc.) have been chosen on Pascal VOC (for the procedure behind these choices, please refer to the original paper). It is not the case that the same hyperparameters would be suitable for other datasets, thus it is your task finding an appropriate set of hyperparameters for your dataset.

This repository is a replication of an academic paper. Anything else besides that is a bonus (like an ability to train on your own datasets).

akshittyagi commented 7 years ago

Okay. But the model is also not working for VOC dataset when not using the pretrained .ckpt file

DrSleep commented 7 years ago

It works (proof, proof) on VOC with either pre-trained or not pre-trained files. Make sure that your setup is correct.

wangruixing commented 7 years ago

I also meet this problem, I use VOC2012, and pretrained model..

dongzhuoyao commented 7 years ago

same here

chenyuZha commented 7 years ago

In my case I used my own data set to do training. At first I took train.py then the loss went down very very very slowly (from 10 to 8 for 60000 steps), then I took another script train_msc.py and the loss began to go down very quickly , and I found that the second one did training better than the first since the loss was much smaller (about 3 instead of 8 in my case).

zhengyang-wang commented 7 years ago

May I know the final loss for after running train.py for 20K iterations with deeplab_resnet_init.ckpt as a start? I used PASCAL dataset and the final loss was about 1.3. It would be better if you could provide the graph of your training curve?

ChuanWang90 commented 6 years ago

Same here. With the default configuration and PascalVOC the loss oscillates between 1.2-1.3. Could someone plot the training curve or tell which are the loss values after 20K iterations for example? Thanks!

FeiWard commented 6 years ago

Have someone show the loss after 20K? It is about 1.18 in my PC. Or who knows the reason?

EternityZY commented 6 years ago

my loss is always about 1.3 and the result predicted the images is black,nothing result.I use default hyperparameters and voc2012 dataset with deeplab_resnet.ckpt as a start.why doesn't work?

PallawiSinghal commented 4 years ago

my loss is always about 1.3 and the result predicted the images is black,nothing result.I use default hyperparameters and voc2012 dataset with deeplab_resnet.ckpt as a start.why doesn't work?

Hi were you able to solve the issue.

PallawiSinghal commented 4 years ago

Hi, My loss does not change. It has become stagnant. I have tried everything mentioned related to deeplabv3+ on every blog. I am training to detect roads. My images are of 2000x2000. My training data has 45k images. I have created my image in the form of PASCAL VOC. I have three kinds of pixels. background = [0,0,0] Void class = [255,255,255] road = [1,1,1] so the number of classes = 3 I am using PASCAL VOC pre trained weights.

changes in train_util.py are : 1. ignore_weight = 0 label0_weight =10 label1_weight = 15 not_ignore_mask = tf.to_float(tf.equal(scaled_labels, 1)) * label0_weight

my train.py

nohup python deeplab/train.py \ --logtostderr \ --training_number_of_steps=65000 \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_batch_size=2 \ --initialize_last_layer=False\ --last_layers_contain_logits_only=True\ --dataset="pascal_voc_seg" \ --tf_initial_checkpoint="/data/old_model/models/research/deeplabv3_pascal_trainval/model.ckpt" \ --train_logdir="/data/old_model/models/research/deeplab/mycheckpoints" \ --dataset_dir="/data/models/research/deeplab/datasets/tfrecord" > my_output.log &

Please help 👍 INFO:tensorflow:global step 700: loss = 0.1759 (0.449 sec/step) INFO:tensorflow:global step 710: loss = 0.1695 (0.655 sec/step) INFO:tensorflow:global step 720: loss = 0.1742 (0.689 sec/step) INFO:tensorflow:global step 730: loss = 0.1710 (0.505 sec/step) INFO:tensorflow:global step 740: loss = 0.1708 (0.868 sec/step) INFO:tensorflow:global step 750: loss = 0.1683 (0.632 sec/step) INFO:tensorflow:global step 760: loss = 0.1692 (0.442 sec/step) INFO:tensorflow:global step 770: loss = 0.1693 (0.597 sec/step) INFO:tensorflow:global step 780: loss = 0.1665 (0.441 sec/step) INFO:tensorflow:global step 790: loss = 0.1680 (0.548 sec/step) INFO:tensorflow:global step 800: loss = 0.1708 (0.372 sec/step) INFO:tensorflow:global step 810: loss = 0.1674 (0.327 sec/step) INFO:tensorflow:global step 820: loss = 0.1666 (0.951 sec/step) INFO:tensorflow:global step 830: loss = 0.1651 (0.557 sec/step) INFO:tensorflow:global step 840: loss = 0.1663 (0.506 sec/step) INFO:tensorflow:global step 850: loss = 0.1646 (0.446 sec/step) INFO:tensorflow:global step 860: loss = 0.1666 (0.424 sec/step) INFO:tensorflow:global step 870: loss = 0.1654 (0.520 sec/step) INFO:tensorflow:global step 880: loss = 0.1662 (0.675 sec/step) INFO:tensorflow:global step 890: loss = 0.1673 (0.325 sec/step) INFO:tensorflow:global step 900: loss = 0.1633 (0.548 sec/step) INFO:tensorflow:global step 910: loss = 0.1659 (0.374 sec/step) INFO:tensorflow:global step 920: loss = 0.1639 (0.663 sec/step) INFO:tensorflow:global step 930: loss = 0.1658 (0.442 sec/step) INFO:tensorflow:global step 940: loss = 0.1654 (0.568 sec/step)

subbulakshmisubha commented 4 years ago

@PallawiSinghal Did u find a solution to your problem?