Closed YogeshShitole closed 5 years ago
I am also interesting to this issue. So, I hope the authors reply it and share a guideline for the training.
@YogeshShitole did you solve the problem? I think the problem is related to either passing pixel values higher than 1 or showing them without rescaling. I mean, the pixel values must be in range between 0 to 1. then the algorithm amplifies them during image showing process. If you see only white images then the values must be divided by 255.
Sorry for the late reply. Thanks for your interest in AOD-Net. @asfix Yes, you are right.
Hi @Boyiliee .could you please provide me an email of yours? Btw. I thank you.
You can find in my website including more details about AOD-Net. Thanks.
Hi @Boyiliee while training AOD-Net with NYU2 database "Loss is not converging" I prepared train script but during training loss is not converging I am training network with NYU2 database with 27, 256 as mentioned in your paper I have used your test_template.prototxt as training.prototxt and I am training network with 150000 iteration ~ ((27,567/batchsize=8)*40 epoch) as Euclidean loss layer is used in prototxt loss is not converging over the period it remain same around 15000 to 16000 upon training completion I even checked for inference with trained model_iter_150000.caffemodel instead of your AOD_Net.caffemodel but it is producing completely white image as output instead of giving dehazed image
below is my solver.prototxt content
net:"Training.prototxt" base_lr: 0.001 lr_policy: "fixed" display: 20 max_iter: 150000 momentum: 0.9 weight_decay: 0.0001 snapshot: 15000 snapshot_prefix: "models/model" solver_mode: GPU type: "SGD"
Training.prototxt is same as test_template.prototxt with below modification input_dim: 1 -----> input_dim: {batchSize} for data/label layer and for each conv layer i included weight_filler in convolution_param weight_filler {{ type: "gaussian" }}
My train script
import os import numpy as np from pylab import * import re import random import cv2 print cv2.version import ntpath ntpath.basename('a/b/c') def path_leaf(path): head, tail = ntpath.split(path) return tail or ntpath.basename(head)
Train_DIR = '../data/AODtrain/training/' Label_DIR = '../data/AODtrain/original/'
Network Training parameters for input Image data
height = 480 width = 640 batch = 8 # batch size
import sys sys.path.append("/home/ubuntu/Tools/caffe/python/") import caffe
def EditFcnProto(templateFile, height, width, batch_size): with open(templateFile, 'r') as ft: template = ft.read()
print templateFile
def createBatch(img_dir, label_dir, batch_size): batchdata = [] labelbatchdata = [] for i in range(batch_size): fname = random.choice(os.listdir(img_dir)) imagepath = Train_DIR + fname
print fname
def train(): caffe.set_mode_gpu() caffe.set_device(0)
if name == 'main': train()
snapshot of output
I0530 15:41:02.473088 2152 sgd_solver.cpp:112] Iteration 51940, lr = 0.001 I0530 15:41:07.138154 2152 solver.cpp:239] Iteration 51960 (4.28732 iter/s, 4.66492s/20 iters), loss = 157563 I0530 15:41:07.138191 2152 solver.cpp:258] Train net output #0: loss = 157563 (* 1 = 157563 loss) I0530 15:41:07.138197 2152 sgd_solver.cpp:112] Iteration 51960, lr = 0.001 I0530 15:41:11.762470 2152 solver.cpp:239] Iteration 51980 (4.32514 iter/s, 4.62412s/20 iters), loss = 156092
I am confused what is going wrong here loss is not converging, please tell me what I am doing wrong and suggest how to proceed basically I am trying to reproduce your paper for learning
Thanks