cvlab-epfl / LIFT

Code release for the ECCV 2016 paper
487 stars 168 forks source link

How to generate the data for training descriptor network #9

Closed 13331151 closed 7 years ago

13331151 commented 7 years ago

Hi, I'm Jack. I recently trained a model of descriptor network but it didn't work well. Could you tell me how your data set for training descriptor network generate? And could you tell me the validation err you get when training the descriptor network(mine is about 2.1)? My process:

  1. Pick a structure, then re-project it back into two features in two corresponding images;
  2. Pick another structure very close to the former one, also re-project it back;
  3. In each image, I have two projected feature points, I can get the direction and scale from it;
  4. Then I crop a patch according to each feature point's direction and scale.

That's what I get for example:
github_iss1

Thanks!!! :)

13331151 commented 7 years ago

@kmyid

kmyi commented 7 years ago

Hi Jack,

Could you tell me how your data set for training descriptor network generate?

We do a similar process. However, in our case, we use the actual raw SIFT points detected at each image, not the reprojected points. We crop 6 times the scale at the feature point location, which is the same area the SIFT descriptor looks at.

And could you tell me the validation err you get when training the descriptor network(mine is about 2.1)?

I am not really sure if I can give you a value that can be compared, as we have multiple constants multiplied to balance the positives and negatives. One very important thing is to apply hard mining, depending on the data. And this hard mining should progressively increase as you proceed with learning. Have a look at Eduard's descriptor paper, as it is a paper specifically on this learning strategy.

Hope my answer helps!, Kwang

13331151 commented 7 years ago

I'm very appreciate your help, Kwang. Your answer do help a lot! A little more question is, are you generate the data using Visual SfM's data. I'm new to Visual SfM, and I fail to find the file which stores information about structure points(including their scale, position and orientation in corresponding image pairs).

What I can retrieve now: from *.sift: [x, y, color, scale, orientation] from .nvm: information(without scale and orientation) about structure as well as its corresponding image IDs.

And below is the loss function of the key-point training. Is it the same as what you describe in the paper?:

prediction1_class = theano.log(lasagne.layers.get_output(layers[0]["kp-scoremap"], deterministic=False))
prediction1_class = lasagne.nonlinearity.softmax(prediction1)
prediction1_class = np.cast[floatX](1./6)* theano.tensor.nnet.relu((np.cast[floatX](1.) - prediction1))**2

prediction2_class = theano.log(lasagne.layers.get_output(layers[1]["kp-scoremap"], deterministic=False))
prediction2_class = lasagne.nonlinearity.softmax(prediction2)
prediction2_class = np.cast[floatX](1./6)* theano.tensor.nnet.relu((np.cast[floatX](1.) - prediction2))**2

prediction3_class = theano.log(lasagne.layers.get_output(layers[2]["kp-scoremap"], deterministic=False))
prediction3_class = lasagne.nonlinearity.softmax(prediction3)
prediction3_class = np.cast[floatX](1./6)* theano.tensor.nnet.relu((np.cast[floatX](1.) - prediction3))**2

prediction4_class = theano.log(lasagne.layers.get_output(layers[3]["kp-scoremap"], deterministic=False))
prediction4_class = lasagne.nonlinearity.softmax(prediction4)
prediction4_class = np.cast[floatX](3./6)* theano.tensor.nnet.relu((prediction4np.cast[floatX](1.)))**2

loss_class = prediction1_class+prediction2_class+prediction3_class+prediction4_class
loss_class = lasagne.objectives.aggregate(loss_class, mode='mean')

prediction1 = lasagne.layers.get_output(layers[0]["desc-output"], deterministic=False)
prediction2 = lasagne.layers.get_output(layers[1]["desc-output"], deterministic=False)

loss_pair = theano.tensor.sum((prediction1-prediction2)**2+1e-7, axis=1)
loss_pair = lasagne.objectives.aggregate(loss_pair, mode='mean')

loss = loss_class+loss_pair

params = lasagne.layers.get_all_params(layers[0]["kp-scoremap"], trainable=True)

print ("Kp-output params: " , params)

updates = lasagne.updates.sgd(loss, params, np.cast[floatX](config.learning_rate))

myNet.train_ori_stochastic = theano.function(inputs=[], outputs=loss,\
                                        givens=givens_train, updates=updates)

Thanks again! @kmyid

kmyi commented 7 years ago

I think it's better if @etrulls answers this :-)

fail to find the file which stores information about structure points(including their scale, position and orientation in corresponding image pairs).

kmyi commented 7 years ago

And below is the loss function of the key-point training. Is it the same as what you describe in the paper?:

You also need to include the overlap loss in the pre-training phase at least. In case of the class loss, I think it's similar to what we did. You also need a hyper parameter to balance loss_class and loss_pair. This parameter should be data-dependent.

etrulls commented 7 years ago

Sorry about the delay, I wasn't receiving issue notifications. Extracting patches from the NVM and SIFT files is quite easy, this does most of the work: https://github.com/jheinly/visual_sfm_support (it's mostly self-explanatory)

You should be able to retrieve the SIFT keypoints used by the reconstruction, and from there you can extract the patches from the original images.

13331151 commented 7 years ago

Thanks for your reply, I will check it out immediately :)

13331151 commented 7 years ago

Hi, Sorry to border you guys, but I really wonder how to extract the data to train the model mentioned in LIFT. In the paper, you say Roman Forum has 1.6k images and 51k unique points, but the dataset I downloaded has 7k images. And even after VisualSfM's 3D reconstruction, there are still 1.8k images remained, and 400k unique 3D points in all nvm files. Seeing the bad performance of my trained model, I'm thinking if I did something wrong or different from you.

I generated nvm files in these manner: Start VisualSfM->Open multiple images->Choose all images in Roman Dataset(7k in total)->Compute missing matches->Compute 3D reconstruction->Save NView Matches->Then I got 22 nvm files for 22 different scene->I parsed each nvm file and below is my parsing log, you can see that the number of the points is very large...Could you tell me where I did wrong, please? Thank you so much. @kmyid @etrulls

Images: 1669

Points: 377543

Done loading ../data/TrainingData/Roman_Forum/roman1.nvm

Images: 38

Points: 10450

Done loading ../data/TrainingData/Roman_Forum/roman2.nvm

Images: 32

Points: 3917

Done loading ../data/TrainingData/Roman_Forum/roman3.nvm

Images: 19

Points: 4145

Done loading ../data/TrainingData/Roman_Forum/roman4.nvm

Images: 17

Points: 3085

Done loading ../data/TrainingData/Roman_Forum/roman5.nvm

Images: 13

Points: 6765

Done loading ../data/TrainingData/Roman_Forum/roman6.nvm

Images: 12

Points: 4045

Done loading ../data/TrainingData/Roman_Forum/roman7.nvm

Images: 11

Points: 597

Done loading ../data/TrainingData/Roman_Forum/roman8.nvm

Images: 8

Points: 2841

Done loading ../data/TrainingData/Roman_Forum/roman9.nvm

Images: 8

Points: 1665

Done loading ../data/TrainingData/Roman_Forum/roman10.nvm

Images: 7

Points: 2446

Done loading ../data/TrainingData/Roman_Forum/roman11.nvm

Images: 5

Points: 1080

Done loading ../data/TrainingData/Roman_Forum/roman12.nvm

Images: 5

Points: 486

Done loading ../data/TrainingData/Roman_Forum/roman13.nvm

Images: 4

Points: 1518

Done loading ../data/TrainingData/Roman_Forum/roman14.nvm

Images: 4

Points: 1467

Done loading ../data/TrainingData/Roman_Forum/roman15.nvm

Images: 4

Points: 1282

Done loading ../data/TrainingData/Roman_Forum/roman16.nvm

Images: 4

Points: 122

Done loading ../data/TrainingData/Roman_Forum/roman17.nvm

Images: 3

Points: 1485

Done loading ../data/TrainingData/Roman_Forum/roman18.nvm

Images: 3

Points: 654

Done loading ../data/TrainingData/Roman_Forum/roman19.nvm

Images: 3

Points: 324

Done loading ../data/TrainingData/Roman_Forum/roman20.nvm

Images: 3

Points: 207

Done loading ../data/TrainingData/Roman_Forum/roman21.nvm

Images: 3

Points: 187

Done loading ../data/TrainingData/Roman_Forum/roman22.nvm

13331151 commented 7 years ago

http://www.cs.cornell.edu/projects/1dsfm/

This the link where I downloaded the Roman Dataset.

kmyi commented 7 years ago

Hi Jack,

As ICCV is approaching, I think I won't have much time to answer you. I'll try to get back to you as soon as I can!

Cheers, Kwang

13331151 commented 7 years ago

Wow, I am very expected to see your new work and I wish you have a great success in ICCV~ :)

kmyi commented 7 years ago

Hi Jack,

I believe I am a bit late now. We are working on releasing the training part as well. Hopefully soon. This time, it will be tensorflow.