alexgkendall / caffe-posenet

Implementation of PoseNet
Other
496 stars 204 forks source link

how to train PoseNet with my own dateset? #27

Open slamjie opened 5 years ago

slamjie commented 5 years ago

Sorry for bothering. I want to use PoseNet in my robot indoors , but when I collect 1000 images, I don't know how to set the arguments in the solver_posenet.prototxt.

My prototxt like this :

net: "./posenet/models/train.prototxt"
test_initialization: false
test_iter: 250    
test_interval: 20
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 250
display: 20
max_iter: 30000
solver_type: ADAGRAD
weight_decay: 0.005
snapshot: 2000
snapshot_prefix: "./posenet/models/training"
solver_mode: GPU

but when I begin to train , I1212 12:48:09.793629 8576 solver.cpp:237] Iteration 0, loss = 584.759 the loss begin at 584.759 and always stay at 125. Does it mean PoseNet train successfully ? Would you like to share the solver_posenet.prototxt to me? I would appreciate if you give me some advises about possible mistakes that I might make.

DRAhmadFaraz commented 5 years ago

@slamjie Are you able to train the pose-net on your own custom images.?

slamjie commented 5 years ago

@DRAhmadFaraz Yes,I have trained PoseNet with my own images successfully. It shows good result for me.

DRAhmadFaraz commented 5 years ago

@slamjie Thanx a lot for replying me, actually I also need to train this pose-net on my own collection of RGB Images so please can you Guide me how to train this code on our own custom RGB images.?

I will be thankful to you.

slamjie commented 5 years ago

@DRAhmadFaraz Sorry to reply you so late. Recently I am busy with graduation matters. I have reproduced the PoseNet according to the author's instructions. The only difference is that the training parameters need to be re-adjusted according to the number of images. Here is my solver_posenet.prototxt:

net: "./posenet/models/train.prototxt"
test_initialization: false
test_iter: 250    
test_interval: 200
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 2500
display: 20
max_iter: 30000
solver_type: ADAGRAD
weight_decay: 0.005
snapshot: 2000
snapshot_prefix: "./posenet/models/training"
solver_mode: GPU

I got the training_iter_30000.caffemodel,then I used test_posenet.py to test the PoseNet. The result shows like this: image

Its accuracy is not as good as mentioned in the paper, but it has met my needs. I hope these are useful to you.

DRAhmadFaraz commented 5 years ago

@slamjie Thanx a lot for helping me, these all steps would be helpful for me. I have one last question. Where should I have to put the input images data-set directory path.? in which file.?

further I didnt see the author's instructions for custom images on this repository. If you have such instructions,share its link with me.

I will be thankful to you. Regards

slamjie commented 5 years ago

@DRAhmadFaraz For training the PoseNet, I use the creat_test_lmdb.py and create_train_lmdb_.py to get the LMDB files.

import sys
sys.path.append('/home/jaco/caffe-posenet-master/python')

import numpy as np
import lmdb
import caffe
import random
import cv2
import argparse

directory = '/home/jaco/realsense/train/'
dataset = 'train.txt'

poses = []
images = []

with open(directory+dataset) as f:
    for line in f:
        fname, p0,p1,p2,p3,p4,p5,p6 = line.split()
        p0 = float(p0)
        p1 = float(p1)
        p2 = float(p2)
        p3 = float(p3)
        p4 = float(p4)
        p5 = float(p5)
        p6 = float(p6)
        poses.append((p0,p1,p2,p3,p4,p5,p6))
        images.append(directory+fname)

r = list(range(len(images)))
random.shuffle(r)

print 'Creating PoseNet Dataset.'
env = lmdb.open('Train', map_size=int(1e12))

count = 0

for i in r:
    if (count+1) % 100 == 0:
        print 'Saving image: ', count+1
    X = cv2.imread(images[i])
    X = cv2.resize(X, (455,256))    # to reproduce PoseNet results, please resize the images so that the shortest side is 256 pixels
    X = np.transpose(X,(2,0,1))
    im_dat = caffe.io.array_to_datum(np.array(X).astype(np.uint8))
    im_dat.float_data.extend(poses[i])
    str_id = '{:0>10d}'.format(count)
    with env.begin(write=True) as txn:
        txn.put(str_id, im_dat.SerializeToString())
    count = count+1

env.close()

Then I use the create_mean.sh to get the mean files for train and test data-set.

#!/usr/bin/env sh
set -e

cd /home/jaco/caffe-posenet-master/build/tools
DBTYPE=lmdb
echo "Computing image mean..."
./compute_image_mean -backend=$DBTYPE $1 $2
echo "Done."

I change the train_posenet.prototxt like this:

name: "GoogLeNet"
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    **source: "posenet/scripts/Train/"**
    batch_size: 64
    backend: LMDB
  }
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: false
    crop_size: 224
    **mean_file: "posenet/scripts/train.binaryproto"**
  }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    **source: "posenet/scripts/Test/"**
    batch_size: 1
    backend: LMDB
  }
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 224
    **mean_file: "posenet/scripts/test.binaryproto"**
  }
}

These bold text are the path of input images data-set and their mean files. It just lets the model know where the LMDB files and mean files are.

Finally I use the train.sh to train the PoseNet and get the caffemodel files.

#!/usr/bin/env sh

TOOLS=./build/tools

$TOOLS/caffe train \
  --solver=posenet/models/solver_posenet.prototxt

I used 1000 images for training and 250 images for testing.

I hope it's useful to you. :)

DRAhmadFaraz commented 5 years ago

@slamjie Thanx a lot, Its all clear for me. and you are so cooperative person I have ever met. I understand it all. just had one basic confusion and I am really sorry for such basic question but I am new on caffe.

I want to ask that in my own Custom Images.. All I have is the dataset of RGB images in one directory folder...

so I want to ask that in code of creat_test_lmdb.py the line is dataset = 'train.txt' so how can I able to get train.txt file from input image's dataset directory.

I hope after that, I ll be able to train my own images... Thanx a lot and waiting for your last response.

slamjie commented 5 years ago

@DRAhmadFaraz The data in the train.txt is a six-degree-of-freedom pose corresponding to your own images.

DRAhmadFaraz commented 5 years ago

Can you guide me how to extract six-degree-of-freedom pose from a RGB image.?

slamjie commented 5 years ago

@DRAhmadFaraz The author mentions in the paper that he uses structure from motion (SfM) to get the images and their corresponding poses. For me, I use a visual positioning device in my school lab. When I get these images from camera, I can get the pose of the camera in real time.

DRAhmadFaraz commented 5 years ago

@slamjie Thanx a lot. I did all these steps and after ./train.sh command My code get struct here. Is it hardware issue or something else as I even reduced the step size to 1.?

Screenshot from 2019-04-27 20-44-59

While in solver_posenet.prototxt file I did solver_mode: CPU from solver_mode: GPU and the code starts training as shown.

Screenshot from 2019-04-28 00-12-01

and testing results shows..

Screenshot from 2019-04-28 00-21-08

But why I cant run this on my GPU.? My GPU is Nvidia GT 940. 4GB

slamjie commented 5 years ago

@DRAhmadFaraz This may be caused by memory. My GPU is Nvidia 1080. You may need to adjust the parameters in the train_posenet.prototxt for your GPU.

DRAhmadFaraz commented 5 years ago

@slamjie Thanx a lot, you helped me alot, I have trained Successfully on "Cambridge Landmarks dataset", Now all I will have to do is to get thetrain.txt a six-degree-of-freedom pose corresponding to my own images.

Do you know any tool which can extract Structure from Motion (SFM), 6-DOF or something like this from a collection of RGB images.? like in a format given below.

Visual Landmark Dataset V1 ImageFile, Camera Position [X Y Z W P Q R]

/seq-01/frame-000000.color.png -0.123234 -1.120697 -0.988706 0.995174 -0.096421 0.016435 0.006233 /seq-01/frame-000001.color.png -0.136318 -1.122137 -0.988546 0.994928 -0.098053 0.020571 0.007535 /seq-01/frame-000002.color.png -0.136108 -1.122399 -0.989466 0.994811 -0.099181 0.020964 0.007190 /seq-01/frame-000003.color.png -0.136539 -1.121119 -0.989834 0.995035 -0.096976 0.020163 0.008368 ..............................................

slamjie commented 5 years ago

@DRAhmadFaraz Sorry about that. I have no idea about SFM. Maybe you can try some visual SLAM algorithms.

DRAhmadFaraz commented 5 years ago

Dear @slamjie

Hope you are fine, I am still working on this Pose-NET approach, I want to ask one important factor from you, Hope you will help me out.

I have trained successfully this Pose-NET approach successfully on my own dataset, now for further calculations, I want to extract the "Predicted Poses" for every each corresponding image.

Predicted poses includes [ 7 X 1 ] matrix having 3 values of Translation and 4 values of Rotation quaternion

I have checked the code posenet/scripts/test_posenet.py and exclude the predicted poses but for these poses are calculated for 1 iteration but I need predicted poses for every corresponding images.

Can you please help me to sort it out.? I will be thankful to you.

slamjie commented 5 years ago

@DRAhmadFaraz You should change the inputs of train_posenet.prototxt like this:

input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 224
  dim: 224
}

And write a test script to test the image. There are some demos in caffe. It's easy to get the code about how to send images to net in caffe.

The predicted poses are saved like this:

predicted_q = net.blobs['cls3_fc_wpqr'].data 
predicted_x = net.blobs['cls3_fc_xyz'].data

Hope these may help you . : )

abdulwaheedsoudagar commented 2 years ago

Dear @slamjie

Hope you are fine, I am still working on this Pose-NET approach, I want to ask one important factor from you, Hope you will help me out.

I have trained successfully this Pose-NET approach successfully on my own dataset, now for further calculations, I want to extract the "Predicted Poses" for every each corresponding image.

Predicted poses includes [ 7 X 1 ] matrix having 3 values of Translation and 4 values of Rotation quaternion

I have checked the code posenet/scripts/test_posenet.py and exclude the predicted poses but for these poses are calculated for 1 iteration but I need predicted poses for every corresponding images.

Can you please help me to sort it out.? I will be thankful to you.

Hello @DRAhmadFaraz , can you please tell me how did you generate the train.txt file, Thanks.