Jianlong-Fu / Recurrent-Attention-CNN

225 stars 45 forks source link

where is the code? #3

Closed q5390498 closed 6 years ago

q5390498 commented 6 years ago

I am sorry, this link "https://1drv.ms/u/s!Ak3_TuLyhThpkxQE4tw96xNUiBbn" can only download model and deploy.prototxt, and this new net has a new layer name's "AttensionCrop",so if we haven't code,we cannot use it.It is best that author will public the source code. thank you.

Geeshang commented 6 years ago

The new download link includes the "AttentionCrop" layer. https://1drv.ms/u/s!Ak3_TuLyhThpkxifVPt-w8e-axc5

q5390498 commented 6 years ago

thank you very very much!

Actasidiot commented 6 years ago

I'm not familiar with Caffe. It seems that the authors haven't released the code of training process. Has anyone implemented the alternative training process mentioned in this paper?

q5390498 commented 6 years ago

@KawhiWang I am sorry, I can not open this link "https://1drv.ms/u/s!Ak3_TuLyhThpkxifVPt-w8e-axc5". Have you download these files? Can you share it in baiduyun?

Actasidiot commented 6 years ago

Of course. You can download it in http://pan.baidu.com/s/1pL6vS63. If you are also going to implement the training process, I hope to get your help.

clover978 commented 6 years ago

@KawhiWang @q5390498 I am also working on this problem. Did you make any progress on how to train RA_CNN?

lizh0019 commented 6 years ago

has anyone tested the performance on the 200-bird dataset? I have created the standard lmdb (short side 448), and tested the test images(crop to 448*448) using the given caffemodel and deploy.prototxt, but the accuracy is about 1/200, random guess. I have tried my best to check the whole process, but haven't found any bug. Anyone can help?

clover978 commented 6 years ago

@lizh0019 How do you define the labels in dataset? The labels in CUB is [1,2,...200], you should change it to [1,2,.....,0]. I test 500 samples on CUB-200-2011, I don't remember the exact result, but I am sure it exceeds 80%. Besides, it's likely that I directly resize the image to fixed size [448*448] rather than crop it, anyway, I don't think it could make so difference.

Actasidiot commented 6 years ago

@clover978
Do you directly resize the images in CUB_200_2011/images into [448*448] ?
Do we need to crop the birds from backgrounds according to bounding_boxes.txt?

lizh0019 commented 6 years ago

@clover978 thanks! i have had put offset 1 and modulu 200 for labels, and accuracy around 85%. But it seems we need significant modification from deploy.prototxt for training? Have you ever tried retrain caffe model?

clover978 commented 6 years ago

@KawhiWang Yes, I just resize the original image into [448*448], The performance is 80%~85%, I think it would be better to keep the ratio of images, like @lizh0019 . As for the bounding box, I don't think we should use this information, according to the paper, RA_CNN does not need boundingbox annotations in training process as well as testing.

clover978 commented 6 years ago

@lizh0019 Giving the caffemodel, we can load and parse it, which means we can then generalize the trian_val.prototxt. Actually, I have done this, and the train_val.prototxt is not too different from deploy.prototxt . The problem is I don't know how to initialize the network. So I didn't make it to train from scratch.

clover978 commented 6 years ago

@lizh0019 https://gist.github.com/clover978/1969d4648458cf876af92b3507856d70 I make it a gist. How to load and parse a caffemodel file. I think the result should be identical with prototxt file used for trainning

lizh0019 commented 6 years ago

@clover978 thanks a lot! I noticed that all the lr_mult and decay_mult in the generated train_val.prototxt are 0.0. According to the paper "Training strategy" section, I think the author might have a script to generate different train_val.prototxt with different lr_mult and decay_mult, keep some 0.0 and others 1.0, while 1.0 and others 0.0 in next super-iteration (in each iteration is a whole training process), and repeat many such super-iterations?

clover978 commented 6 years ago

@lizh0019 The lr_mult and decay_mult are set by experience. I refer to vgg_train_val.prototxt. As for the training process, we share the same idea, to be exactly, freeze parameters of APN and VGG alternatively. The prototxt should not differ from generated one too much, we just need to modify lr_mult and decay_mult. I was struck at next step, the scale1 stem takes in images with size of 448448, but vgg's input is 224224, so I fine-tuned another vgg on CUB dataset, but the accuracy is around 0.5, according to paper, it should be 0.79. The input of scale2 and scale3 stems is 224224, so a fine-tuned vgg with 224224 input size is also needed, but I didn't try it. I am not sure did I make any mistakes. If you spot any ones, or you get a different result when repeating training process, please inform me.

lizh0019 commented 6 years ago

@clover978 you mean you finetuned a vgg model with input 448x448 from pretrained vgg-19 model? I tried this by just modifying the train_val.prototxt input size from 224x224 to 448x448, and accuracy is about 0.79, and conv5_4 size is 5121414 (was 51277).

clover978 commented 6 years ago

@lizh0019 Great ! It seems that there were some bugs in my early work. BTW, what's the performance of fine-tuned model with 224*224 input size? If it works as expect, I think it's possible to start training with those pre-trained model. Besides, can I talk to you via some IM software like QQ, to debug my fine-tune process. Thanks.

lizh0019 commented 6 years ago

Pls send email to lizhen007_0@hotmail first.

On 17 Aug 2017 6:10 PM, "clover978" notifications@github.com wrote:

@lizh0019 https://github.com/lizh0019 Great ! It seems that there were some bugs in my early work. BTW, what's the performance of fine-tuned model with 224*224 input size? If it works as expect, I think it's possible to start training with those pre-trained model. Besides, can I talk to you via some IM software like QQ, to debug my fine-tune process. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Jianlong-Fu/Recurrent-Attention-CNN/issues/3#issuecomment-323027480, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMNkfidZ31QkEJeS6AQzyJMwW4HeRjwks5sZBGlgaJpZM4OY-21 .

lizh0019 commented 6 years ago

@clover978 I am still not clear how to use the pre-trained vgg for scale 2 and 3, the layer name in scale 2 is something like "conv2_2_A", how to borrow the weights in "conv2_2" of the pretrained model?

clover978 commented 6 years ago

@lizh0019 it should be familiar with siamese network

lizh0019 commented 6 years ago

@clover978 Hi Caffe_Expert, can you send an email to my hotmail so we can add contact of wechat or qq?

clover978 commented 6 years ago

@lizh0019 That's weird. I have sent email to lizhen007_0@hotmail.com. Seems you didn't receive it.

QQQYang commented 6 years ago

It seems that the generated train_val.prototxt does not include a loss layer. In the original paper, the whole loss consists two parts. Do we need to create a new loss layer to calculate the pairwise ranking loss?

liangzimei commented 6 years ago

@clover978 hi, i use your script 'net.py' to parse RA_CNN.caffemodel. However, the output net structure don't contain any loss layer. do you know why? thanks....

clover978 commented 6 years ago

@liangzimei Sorry, I didn't get loss layer either. I didn't remember that I have modified the generated prototxt until you remind me. I'll delete my wrong comment. It seems that implementing rank-loss is still in the way and we need more information from the author.

simo23 commented 6 years ago

Hi everyone, I'm trying to replicate the 0.79 accuracy result. Thanks for the great details you are sharing. I have just some questions to ask if anyone can help:

@lizh0019 Can you share the details of your training process which achieves 0.79 accuracy? You said that you modified the proto to have input from 224 to 448 but you mean the VGG net trained by the authors of VGG or the RA-CNN model?

The standard VGG has max pool layers which pass to the FC layers a 7x7x512 ( from conv5). So if I want to use the weights of the authors on a 448x448 the way I think is correct, also from the VGG paper, is to:

The VGG authors convert the 7x7x512 FC into CONV layers so this step comes for free, but then we need to implement the avg of the 4 scores right? I read this in the VGG paper https://arxiv.org/abs/1409.1556v6 section 3.2 "Testing" when they say

"The result is a class score map with the number of channels equal to the number of classes, and a variable spatial resolution, dependent on the input image size. Finally, to obtain a fixed-size vector of class scores for the image, the class score map is spatially averaged (sum-pooled)."

Lastly, can you share the train_val.prototxt?

Thanks, Andrea

super-wcg commented 6 years ago

@clover978 Could you send your qq number to my e-mail 1136912015@qq.com? I want to get the train_prototxt, and I want to ask you some questions.

chenfeima commented 6 years ago

@clover978 I also need the train_prototxt, and want to know how you get the train_prototxt.

chenfeima commented 6 years ago

@lizh0019 How you do to get the accuracy 85%? I run the test net only got accuracy 83%.

chenfeima commented 6 years ago

@KawhiWang Hello, I retrain the model, the scale2's accuary is lower than the scale1's. However, according to the papper: the scale2's accuary is higher than the scale1's. I want to know whether tthe "AttentionCrop" layer is given by author? If you have retrained the model, can you help me?

Michael-Jing commented 6 years ago

@chenfeima The AttentionCrop layer is given by the author, it's in the layers folder. By the way, can you share me with your train_prototxt.

chenfeima commented 6 years ago

@Michael-Jing I realy need your help. My QQ number is 1691767172.

Michael-Jing commented 6 years ago

@chenfeima Sorry that I don't use QQ normally. And Fu has published a paper called multi attention cnn and the code is also released which doesn't have custom layers, you can check that out.

chenfeima commented 6 years ago

@Michael-Jing I have known the news of the new paper for a long time. Maybe, retraining 1 model can learn more than checking 10 other's caffemodels. Ha-ha.

cs2103t commented 6 years ago

@clover978 @lizh0019
Hi, may I ask how do you create the lmdb?

im = cv2.imread(single_file)
im = cv2.resize(im, (448,448))
im = im.transpose((2,0,1))
im_dat = caffe.io.array_to_datum(im, label)
txn.put(str(counter), im_dat.SerializeToString())

This is how I read the images and store to lmdb. However, I cant reproduce the accuracy. Can you help me with this? Thank you very much in advance.

chenfeima commented 6 years ago

@cs2103t Where is your labels get from? Only read this code, your labels are all zero.

cs2103t commented 6 years ago

@chenfeima Thank you for the quick reply! I got the label from the folder name. As the folder is indexed from 1 to 200, I can just use it right? I can only get the accuracy of random guess currently. subfolder = subdir.split('/')[1] label= int(subfolder.split('.')[0])

chenfeima commented 6 years ago

@cs2103t Maybe you need to change the index into [0, 199].

cs2103t commented 6 years ago

@chenfeima So I just minus all the index by 1? I tried also but I still get random guess accuracy.

cocowf commented 6 years ago

@Michael-Jing sorry,I can't find attentioncrop layer in folder,can you help me?

Michael-Jing commented 6 years ago

@cocowf it's located in caffe/src/caffe/layers

cocowf commented 6 years ago

thank you

by the way ,should we compile the new layer about attentioncrop in my own caffe by use document attention_crop_layer.hpp,cu and .cpp?and how about loss function?where can I find it ?

------------------ 原始邮件 ------------------ 发件人: "Yanqing Jing";notifications@github.com; 发送时间: 2017年12月28日(星期四) 中午11:38 收件人: "Jianlong-Fu/Recurrent-Attention-CNN"Recurrent-Attention-CNN@noreply.github.com; 抄送: "王菲"597512150@qq.com; "Mention"mention@noreply.github.com; 主题: Re: [Jianlong-Fu/Recurrent-Attention-CNN] where is the code? (#3)

@cocowf it's located in caffe/src/caffe/layers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Michael-Jing commented 6 years ago

@cocowf Yes, you should compile caffe. as to the loss function, I don't have knowledge to share on that. some other guy here may help.

ghost commented 6 years ago

Hi everyone, I'm trying to read the images from Caltech-UCSD Birds 200. but there isn't any image except an image file 659MB????...

ouceduxzk commented 6 years ago

for those who want to reproduced the work, let's collaborate together and here is my initial work https://github.com/ouceduxzk/Fine_Grained_Classification/tree/master/RA-CNN

zhiAung commented 6 years ago

Who has the pytorch version

22wei22 commented 6 years ago

Who has the pytorch version,thanks