Closed NinaJina closed 5 years ago
Hi @NinaJina, that sounds correct. You are right that the corr_weights should not be the reason of such low performance.
Some issues I can think:
Hi @andrefaraujo,Thanks for your reply. The local descriptor I'm using is exactually the SIFT descriptor extraction from this repository. I also tested that param on the small part of data, which gives 0.23mAP and 0.26P@1.(201302xx of the standford600k, and the 201302 dir in query. I selected the relevant ground truth from the all GTs) Moveover, if I use 201302xx data to train the param,(sampled 6000 frames, and finally 150*6000 local descriptors, 40 gmm iters), I get 0.25 mAP, 0.26P@1. [Don't know whether these two pieces of information help, but I think this result is higher then random, which means the gmm param works to some extend?]
BTW, I want to make sure several things. 1) Does sift.pre_alpha.0.50.desc_eigenvectors and sift.pre_alpha.0.50.desc_covariance means pca->eigvec and pca->cov? 2) float* pDesc = new float[nNumFeatures*nDescLength]; gmm_t* g = gmm_learn(d, n, k, niter, pDesc, 1, 1, redo,0); Does the *(pDesc+i*d+j) means the j-th element of the i-th sift descriptor? [the description in the yael lib about the matrix seems a little confusing to me.] 3) Did I omitted any step which may cause the low mAP? Thanks!
sift.pre_alpha.0.50.desc_covariance
also contains the descriptor mean (you can see that the descriptor mean is read here using this file).asym_scoring_mode
(link) are you using? The default (QAGS) should be reasonable, but experimenting with SGS could also be tried.These are some ideas, let me know how it goes :)
Hi @andrefaraujo, sorry for the late reply, I have been very busy these days, so I have little time to continue the experiment.
It is a long time since I replied last time, so let me first briefly repeat the issue: I am trying to generate the parameters in trained_parameters, but get 0.1 mAP on standford600k dataset.
For several points you suggested last time:
I am using the default setting. Specifically, the DoG detector, frame-based retrieval, and the default QAGS.
Previously I only save pca->cov
the in sift.pre_alpha.0.50.desc_covariance
, now I firstly save pca->mean
in sift.pre_alpha.0.50.desc_covariance
then save pca->cov
in that file as you suggested. But this only brings a very slight improvement. As I missed pca->mean
in the file, I think previously the code wrongly use the first row of pca->cov
as pca->mean
. I think what the code here doing is just minus a mean value from each feature vector, so even I am using the wrong value previously, this should not greatly hurt mAP, because it is only a constant value. [I'm not sure whether this idea is correct]
Since the code here only load pca->mean
, and no where else in the code use file sift.pre_alpha.0.50.desc_covariance
again, I think pca->cov
is not used by the code. Is that correct?
I tried to increase training data, and that seems to help a lot. Previously I sampled 150*20000 SIFT from standford600k, now I sample 150*80000, and that improve the mAP from 0.1 to 0.2. Is this increase in mAP reasonable for 4 times larger training data? And how much data do think reasonable for training gmm for standford600k?
I am now sampling SIFT from standford600k to train the gmm, but your paper mentioned that you sampled SIFT from Flickr dataset. Do you think Flickr is better?
Thanks again for your reply! I will go on the experiment, and if there is any progress, I'll reply here!
Closing due to lack of activity, please feel free to re-open if necessary.
Hi @andrefarauj, I am trying to train trained_parameters, following #10. But I get a very low mAP and P@1 on the standford600k dataset. (about 0.1 mAP, and 0.26 P@1). I want to describe some details, and hope you can give me some advice about how to improve mAP. 1) I first sampled about 20000 frame from the whole dataset, then gather all the local descriptors of the 20000 frames and shuffle them. After that sample about 150*20000 frames from all these local descriptors, and use the pca function in yael lib to reduce the 128 dim descriptor to 32 dim. 2) Next, I use the gmm function in yael to train gmm. I set niter to 60, centroid to 512. It takes about 10 hours to train the gmm. 3) For the corr_weights, I still use the old one.(sift.pre_alpha.0.50.pca.32.gmm.512.pre_alpha.0.50.corr_weights). Actually I am not quite sure whether this must also be regenerated. But since I am using the same sift descriptor, I think it is not the reason why the mAP is so low. Is there anything wrong in the whole process? Can you give me some advice? Thanks!