JonghwanMun / TextguidedATT

The implementation of Text-guided Attention Model for Image Captioning
Other
22 stars 8 forks source link

VGG-FCN t7 file #6

Open YuanEZhou opened 6 years ago

YuanEZhou commented 6 years ago

You had released the Res-101.t7 file to extract image features, can you release the VGG-FCN t7 file ? Thanks.

JonghwanMun commented 6 years ago

When I implemented FCN-based text-guided attention model, I use the pre-trained caffe model from From Captions to Visual Concepts and Back. But, currently the FCN-model and corresponding feature extraction code are missing. Sorry for that, but I think it is simple to load pre-trained caffe model in torch with loadcaffe package and you can easily write code.

YuanEZhou commented 6 years ago

Thanks!

JonghwanMun commented 6 years ago

The "deploy.prototxt" file is on "visual-concepts/output/vgg" folder in prototxt https://github.com/s-gupta/visual-concepts/tree/master/output/vgg . I also extracted features at the layer named by "fc-conv7" (or "relu7") and the size of feature maps was 10x10.

If you have other questions, feel free to ask me :)

2018-01-03 16:53 GMT+09:00 YE Zhou notifications@github.com:

I follow your advice and downlaod the snapsshot_iter240000.caffemodel from corresponding address, but there is not 'deploy.prototxt' file . Do you modify the original VGG-16 prototxt file by replacing the (fc6, fc7,fc8) with fully convolutional network ? What the dimentions of extracted feature maps given 512*512 images ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonghwanMun/TextguidedATT/issues/6#issuecomment-354953451, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHpNbyYqpEdUOPciTsFuTCG8ROyVZKlks5tGzHsgaJpZM4RQjYq .

YuanEZhou commented 6 years ago

Thank you! I found the deploy.prototxt file. But I still have some confusion as follow: If you extract features at the layer named by "fc-conv7" (or "relu7") and the size of feature maps was 10 10 4096. The dimention of feature maps is so big! Why not extract features at the layer name by "fc8_coco" ?

JonghwanMun commented 6 years ago

The output of "fc8_coco" layer is probability for attributes, thus I use the visual features from "fc-conv7" as we usually obtain features from fc7 layer rather than fc8 layer in VGG-net. Also, note that the outputs of ResNet are 14x14x2048 size of features given 448x448 images.

2018-01-04 14:17 GMT+09:00 YE Zhou notifications@github.com:

Thank you! I found the deploy.prototxt file. But I still have some confusion as follow: If you extract features at the layer named by "fc-conv7" (or "relu7") and the size of feature maps was 10104096. The dimention of feature maps is so big! Why not extract features at the layer name by "fc8_coco" ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonghwanMun/TextguidedATT/issues/6#issuecomment-355201109, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHpNQC0U4zVlk5DSo57J4SEZcOH_pp2ks5tHF7ZgaJpZM4RQjYq .

YuanEZhou commented 6 years ago

Thank you very much!