Similarity score and pretrained model

SamihaSara commented 3 years ago

Hello,

Thanks for releasing the codes of your project, I think it would be helpful for my research. Can you please tell, how I can get the similarity score between a search and target region? Can you release weights of the pre-trained model, so that I can run some demos with my dataset?

LPXTT commented 3 years ago

Hi,

Thank you for your interests to our work. You can find the pertained model on the GitHub ( in the 'ckpt ‘ directory). The responseMap at Line 291 in ‘track.py' is the similarity score. You can run ’track.py’ to get the similarity score, where the ‘zCrop’ is target region and ‘xCrop’ is the search region.

Best, Peixia School of Electrical and Information Engineering Faculty of Engineering and Information Technology University of Sydney

在 2020年10月15日，下午11:11，SamihaSara notifications@github.com 写道：

Hello,

Thanks for releasing the codes of your project, I think it would be helpful for my research. Can you please tell, how I can get the similarity score between a search and target region? Can you release weights of the pre-trained model, so that I can run some demos with my dataset?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LPXTT/GradNet-Tensorflow/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICKQEY6L6XSXHH34LM5N7TSK3RH7ANCNFSM4SR44Q3A.

SamihaSara commented 3 years ago

Hi,

Thanks for replying. I ran the demo and got the following results.

0 -- Walking2 -- Precision: 100.00 -- Precisions AUC: 41.92 -- IOU: 77.41 -- Speed: 60.77 -- -- Overall stats (averaged per frame) on 1 videos (500.0 frames) -- -- Precision (20 px): 100.00 -- Precisions AUC: 41.92 -- IOU: 77.41 -- Speed: 60.77

and when I print responseMap at line 291 it gives a huge array of numbers. Is not the similarity score of a siamese network supposed to be a number like detection confidence? Also, is the template frame only given in the first frame, and then it tracks it(the template) throughout the demo?

Best, Samiha.

LPXTT commented 3 years ago

Hi,

The template frame is given only in the first frame. Th search region (which is larger than the template region) consists of many candidates, so the corresponding score is not one number. In object detection, many object boxes are send into the final head to get the confidence score. The max value of the responseMap is similar to the max detection confidence.

Best, Peixia School of Electrical and Information Engineering Faculty of Engineering and Information Technology University of Sydney

在 2020年10月17日，上午5:53，SamihaSara notifications@github.com 写道：

Hi,

Thanks for replying. I ran the demo and got the following results.

0 -- Walking2 -- Precision: 100.00 -- Precisions AUC: 41.92 -- IOU: 77.41 -- Speed: 60.77 -- -- Overall stats (averaged per frame) on 1 videos (500.0 frames) -- -- Precision (20 px): 100.00 -- Precisions AUC: 41.92 -- IOU: 77.41 -- Speed: 60.77

and when I print responseMap at line 291 it gives a huge array of numbers. Is not the similarity score of a siamese network supposed to be a number like detection confidence? Also, is the template frame only given in the first frame, and then it tracks it(the template) throughout the demo?

Best, Samiha.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPXTT/GradNet-Tensorflow/issues/7#issuecomment-710423957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICKQE6ZS6PBF2MF4KSKHXTSLCJEBANCNFSM4SR44Q3A.

SamihaSara commented 3 years ago

Hi,

Is there any way to find out the region in the Search image that gives the maximum value in the response map(matches most with the target)?

Regards Samiha

LPXTT commented 3 years ago

Hi,

You can crop the most matched region from Search image centered at the position of the max value in the response map.

Best, Peixia

在 2020年10月17日，下午2:15，SamihaSara notifications@github.com 写道：

Hi,

Is there any way to find out the region in the Search image that gives the maximum value in the response map(matches most with the target)?

Regards Samiha

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPXTT/GradNet-Tensorflow/issues/7#issuecomment-710736762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICKQE5NV4HXFM24WYUK2JLSLED3VANCNFSM4SR44Q3A.

SamihaSara commented 3 years ago

Hi, I need a few more info. Could you please also tell, wherein track.py, target patch is cropped from the image and then its features were extracted? Do response map values represent similarity scores between each pair of search regions and template frames? If I display the tracking result(bounding box) in the current frame, there is a red, yellow, and green bounding box. Please let me know what do these different colors represent.

zCrop, _ = getSubWinTracking(im, targetPosition, (opts['exemplarSize'], opts['exemplarSize']), (np.around(sz), np.around(sz)), avgChans)
xCrops = makeScalePyramid(im, targetPosition, sx*scales, opts['instanceSize'], avgChans, None, opts)

------> What do zCrop and xCrops contain(similarity scores or bounding boxes)?

  zFeat5_gra_init, zFeat2_gra_init, zFeat5_sia_init = sess.run([zFeat5Op_gra, zFeat2Op_gra, zFeat5Op_sia], feed_dict={exemplarOp_init: zCrop0, instanceOp_init: xCrops0, instanceOp: xCrops})
  score_gra, score_sia = sess.run([scoreOp_gra, scoreOp_sia],  feed_dict={zFeat5Op_gra: template_gra, zFeat5Op_sia: template_sia,  instanceOp: xCrops})
   template_gra, zFeat2_gra = sess.run([zFeat5Op_gra, zFeat2Op_gra], feed_dict={zFeat2Op_init: hid_gra, instanceOp_init: np.expand_dims(xCrops[1],0)})

----> what are the roles of this sess.run() in the above 3 lines/ values for which task these are obtaining?

Sincere Thanks, Samiha

LPXTT commented 3 years ago

Hi,

I think you can run the code step by step to see what the meaning of each line is. Showing the size of some variables at the same time would be helpful for you. There are some answers about your questions: 1) In track.py, the target patch is cropped at Line 431 and its feature (zFeat5_sia_init) is extracted at Line 459. 2) The response map values represent similarity scores between each pair of search regions and template frames. 3) I don’t remember the meaning of different colors. 4) zCrop is the target patch cropped from the image. xCrops are the search patches with different scale ratio. 5) The ' zFeat5_gra_init, zFeat2_gra_init, zFeat5_sia_init = sess.run()’ means send target and search patches to the network to get the initial features, Including the initial target feature from the last convolution layer (zFeat5_gra_init) and the second convolution layer (zFeat2_gra_init) of the gradient branch, the initial target feature (zFeat5_sia_init) from the last convolution layer of the siamese network. 6) 'score_gra, score_sia = sess.run()’ This code is used to get the response maps from the gradient branch and the siamese network. 7) template_gra, zFeat2_gra = sess.run([zFeat5Op_gra, zFeat2Op_gra], feed_dict={zFeat2Op_init: hid_gra, instanceOp_init: np.expand_dims(xCrops[1],0)}) This code is used to update the template feature in the gradient branch.

Best, Peixia

在 2020年11月3日，上午2:34，SamihaSara notifications@github.com 写道：

Hi, I need a few more info. Could you please also tell, wherein track.py, target patch is cropped from the image and then its features were extracted? Do response map values represent similarity scores between each pair of search regions and template frames? If I display the tracking result(bounding box) in the current frame, there is a red, yellow, and green bounding box. Please let me know what do these different colors represent. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPXTT/GradNet-Tensorflow/issues/7#issuecomment-720546081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICKQE7IYASQJRITGDOOVT3SN3GSHANCNFSM4SR44Q3A.

LPXTT / GradNet-Tensorflow

Similarity score and pretrained model #7