dmlc / cxxnet

move forward to https://github.com/dmlc/mxnet
Other
1.03k stars 414 forks source link

multilabel configuration #194

Open sundevil0405 opened 9 years ago

sundevil0405 commented 9 years ago

Hi,

We are trying to learn to use cxxnet for a multi-label problem.

We made the following settings:

label_width = 5
label_vec[0,5) = class
target = class
metric = error

but get the error:

Metric: unknown target = label  

Could any one kindly explain this for us or provide us an example of multi-label layer configuration ?

Thanks a lot, YS

sxjzwq commented 9 years ago

modify label_vec[0,5) = class target = class

to label_vec[0,5) = label target = label

see this ... https://github.com/dmlc/cxxnet/issues/139

sundevil0405 commented 9 years ago

Thank you sxjzwq!

I followed your comments and it works. However, I met another error:

Segmentation fault (core dumped)

Is there anyway to fix this?

Thanks a lot!

sxjzwq commented 9 years ago

I guess it is caused by the input data. What's the size of your input ? For example, If it's 224_224_3, and if the size of some images in your data are smaller than 224, you will meet such problem.

You should resize your image when applying im2rec. Check the im2rec help and you will find those parameters.

sundevil0405 commented 9 years ago

Hi sxjzwq, Our input is 512x512x3 and we actually have resized the images before running the code. Could you tell me how to check if some image does not have a right shape? Or is there any other possible reason?Thank you!

sxjzwq commented 9 years ago

I am not sure. May be you should check the format of your image list and re-generate the .rec file using the parameter resize=512. I only met this error when I include a subset of my data. After I check the subset I found some image size is smaller than my network input shape. So I resize them and the error gone. But there might be some other reasons in your case. Please check the input carefully. Good luck!

sundevil0405 commented 9 years ago

We will carefully check the input. Thanks a million!

sxjzwq commented 9 years ago

You're welcome! Please let me know your multi-label classification performance if it works. I am also working on training a multi-label classification network but it seems that my network parameter can not converge.

sundevil0405 commented 9 years ago

Sure! We are trying some simple settings and see what happens. We will let you know the performance if the setting works!! Thank you!

sundevil0405 commented 9 years ago

Hi Qi, We tried multiple parameter settings. It seems the code does work on our data as well. The training error does not even change after multiple rounds, we basically observe things like round 0:[ 1098] 1082 sec elapsed[1] train-error:0.305704 round 1:[ 1098] 2170 sec elapsed[2] train-error:0.305203 round 2:[ 1098] 3259 sec elapsed[3] train-error:0.305203 round 3:[ 1098] 4347 sec elapsed[4] train-error:0.305203 round 4:[ 1098] 5436 sec elapsed[5] train-error:0.305203 round 5:[ 1098] 6524 sec elapsed[6] train-error:0.305203 round 6:[ 1098] 7612 sec elapsed[7] train-error:0.305203 round 7:[ 1098] 8700 sec elapsed[8] train-error:0.305203 round 8:[ 1098] 9789 sec elapsed[9] train-error:0.305203 round 9:[ 1098] 10878 sec elapsed[10] train-error:0.305203 round 10:[ 1098] 11966 sec elapsed[11] train-error:0.305203 round 11:[ 1098] 13054 sec elapsed[12] train-error:0.305203 round 12:[ 1098] 14142 sec elapsed[13] train-error:0.305203 round 13:[ 1098] 15231 sec elapsed[14] train-error:0.305203 round 14:[ 1098] 16319 sec elapsed[15] train-error:0.305203 round 15:[ 1098] 17408 sec elapsed[16] train-error:0.305203 round 16:[ 1098] 18496 sec elapsed[17] train-error:0.305203

I think it would be good to have an example in cxxnet.

sxjzwq commented 9 years ago

Hi May be you can set metric = logloss and try again. And which lose function are you using? Try muti_logistic. I got some positive results on a easy dataset now and my network is fine tuned based on vggnet16. But still trying on my real data.

sundevil0405 commented 9 years ago

Hi sxjzwq, thank you so much for your suggestion. We tried both l2 and softmax as the loss function. We will definitely try your suggestion and let you know if there is an improvement. Thanks again!

sxjzwq commented 9 years ago

start from vgg16.model layer:fc7 wmat:eta = 0.0005 bias:eta = 0.0010 layer:fc8 wmat:eta = 0.0010 bias:eta = 0.0020

round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993 round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034 round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325 round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152 round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876 round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933

start from 0006.model layer:fc7 wmat:eta = 0.0005 bias:eta = 0.0010 layer:fc8 wmat:eta = 0.0005 bias:eta = 0.0010

round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734 round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811 round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354 round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465 round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824 round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289

start from 0012.model layer:fc7 wmat:eta = 0.00025 bias:eta = 0.00050 layer:fc8 wmat:eta = 0.00025 bias:eta = 0.00050

round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376 round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242 round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032 round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162 round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349 round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459

start from 0018.model layer:fc7 wmat:eta = 0.00010 bias:eta = 0.00020 layer:fc8 wmat:eta = 0.00010 bias:eta = 0.00020

round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583

Using RMSE metric will be helpful.

sundevil0405 commented 9 years ago

Hi Qi, Thank you very much for your advice. We will try this. By the way, we tried your last suggestion but we also met the NAN problem. Hopefully it will work this time. Thanks again!!

sxjzwq commented 9 years ago

Hi Yashu

Yes, I don't know how to avoid the NAN problem when using logloss evaluation metric, but the RMSE metric seems works fine. I finally got the train-rmse 1.32312 on my data. And my multi-label classification mAP is bigger than 0.7, much better than using fc7-feature+multi_label_SVM.

I wish this information is helpful.

Best

sundevil0405 commented 9 years ago

Hi Qi,

That's a really good news! We actually followed your suggestion and

changed the RMSE metric. However, the speed seems extreme slow.. We've pre-trained the network for ~3 days using two GTX titan black cards while it only finishes ~300 rounds. How many rounds did your algorithm take? Is that pre-train or fine-tuning?

Thank you very much, Yashu

On Sun, Jul 12, 2015 at 10:22 PM, Qi Wu notifications@github.com wrote:

Hi Yashu

Yes, I don't know how to avoid the NAN problem when using logloss evaluation metric, but the RMSE metric seems works fine. I finally got the train-rmse 1.32312 on my data. And my multi-label classification mAP is bigger than 0.7, much better than using fc7-feature+multi_label_SVM.

I wish this information is helpful.

Best

— Reply to this email directly or view it on GitHub https://github.com/dmlc/cxxnet/issues/194#issuecomment-120787726.

sxjzwq commented 9 years ago

Hi Yashu

I am using the pre-trained VGGNet16 (trained on ImageNet of course) as the initial model. And then fine tune the last FC layer (fc7) and the classification layer (change 1000 to 256, which is my label width). Also, I change the loss layer from softmax to multi_logistic. For all the other layers, I keep learning rate as 0, so the parameters will be fixed as the VGGNet.

I start my training with the learning rate = 0.001 and decrease it when the train-RMSE error doesn't decrease any more. I only trained 36 rounds and because my learning rate has become 0.000001, I stopped the training. The following is my training log:

start from vgg16.model layer:fc7 wmat:eta = 0.0005 bias:eta = 0.0010 layer:fc8 wmat:eta = 0.0010 bias:eta = 0.0020

round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993 round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034 round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325 round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152 round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876 round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933

start from 0006.model layer:fc7 wmat:eta = 0.0005 bias:eta = 0.0010 layer:fc8 wmat:eta = 0.0005 bias:eta = 0.0010

round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734 round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811 round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354 round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465 round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824 round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289

start from 0012.model layer:fc7 wmat:eta = 0.00025 bias:eta = 0.00050 layer:fc8 wmat:eta = 0.00025 bias:eta = 0.00050

round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376 round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242 round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032 round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162 round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349 round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459

start from 0018.model layer:fc7 wmat:eta = 0.00010 bias:eta = 0.00020 layer:fc8 wmat:eta = 0.00010 bias:eta = 0.00020

round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583 round 19:[ 2466] 23353 sec elapsed[20] train-logloss:-nan train-rmse:3.68861 round 20:[ 2466] 35027 sec elapsed[21] train-logloss:-nan train-rmse:3.48819 round 21:[ 2466] 46701 sec elapsed[22] train-logloss:-nan train-rmse:3.29444 round 22:[ 2466] 58375 sec elapsed[23] train-logloss:-nan train-rmse:3.13445 round 23:[ 2466] 70048 sec elapsed[24] train-logloss:-nan train-rmse:2.98958

start from 0024.model layer:fc7 wmat:eta = 0.00001 bias:eta = 0.00002 layer:fc8 wmat:eta = 0.00001 bias:eta = 0.00002

round 24:[ 2466] 11671 sec elapsed[25] train-logloss:-nan train-rmse:3.27728 round 25:[ 2466] 23347 sec elapsed[26] train-logloss:-nan train-rmse:2.95055 round 26:[ 2466] 35017 sec elapsed[27] train-logloss:-nan train-rmse:2.65933 round 27:[ 2466] 46689 sec elapsed[28] train-logloss:-nan train-rmse:2.35525 round 28:[ 2466] 58361 sec elapsed[29] train-logloss:-nan train-rmse:2.04922 round 29:[ 2466] 70034 sec elapsed[30] train-logloss:-nan train-rmse:1.72671

start from 0030.model layer:fc7 wmat:eta = 0.000001 bias:eta = 0.000002 layer:fc8 wmat:eta = 0.000001 bias:eta = 0.000002

round 30:[ 2466] 11675 sec elapsed[31] train-logloss:-nan train-rmse:2.81689 round 31:[ 2466] 23350 sec elapsed[32] train-logloss:-nan train-rmse:2.46264 round 32:[ 2466] 35021 sec elapsed[33] train-logloss:-nan train-rmse:2.16123 round 33:[ 2466] 46691 sec elapsed[34] train-logloss:-nan train-rmse:1.86558 round 34:[ 2466] 58362 sec elapsed[35] train-logloss:-nan train-rmse:1.58915 round 35:[ 2466] 70034 sec elapsed[36] train-logloss:-nan train-rmse:1.32312

sundevil0405 commented 9 years ago

Hi Qi,

 Thank you so much for your advice. Our problem is not suitable for

fine-tuning so we decide to train the net directly. However, the toolbox does not work and we decide to give up cxxnet and turn to caffe. Thank you again for your help and hope we can discuss and collaborate someday : )

Best Regards, Yashu

On Monday, July 13, 2015, Qi Wu notifications@github.com wrote:

Hi Yashu

I am using the pre-trained VGGNet16 (trained on ImageNet of course) as the initial model. And then fine tune the last FC layer (fc7) and the classification layer (change 1000 to 256, which is my label width). Also, I change the loss layer from softmax to multi_logistic. For all the other layers, I keep learning rate as 0, so the parameters will be fixed as the VGGNet.

I start my training with the learning rate = 0.001 and decrease it when the train-RMSE error doesn't decrease any more. I only trained 36 rounds and because my learning rate has become 0.000001, I stopped the training. The following is my training log:

start from vgg16.model layer:fc7 wmat:eta = 0.0005 bias:eta = 0.0010 layer:fc8 wmat:eta = 0.0010 bias:eta = 0.0020

round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993 round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034 round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325 round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152 round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876 round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933

start from 0006.model layer:fc7 wmat:eta = 0.0005 bias:eta = 0.0010 layer:fc8 wmat:eta = 0.0005 bias:eta = 0.0010

round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734 round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811 round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354 round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465 round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824 round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289

start from 0012.model layer:fc7 wmat:eta = 0.00025 bias:eta = 0.00050 layer:fc8 wmat:eta = 0.00025 bias:eta = 0.00050

round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376 round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242 round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032 round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162 round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349 round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459

start from 0018.model layer:fc7 wmat:eta = 0.00010 bias:eta = 0.00020 layer:fc8 wmat:eta = 0.00010 bias:eta = 0.00020

round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583 round 19:[ 2466] 23353 sec elapsed[20] train-logloss:-nan train-rmse:3.68861 round 20:[ 2466] 35027 sec elapsed[21] train-logloss:-nan train-rmse:3.48819 round 21:[ 2466] 46701 sec elapsed[22] train-logloss:-nan train-rmse:3.29444 round 22:[ 2466] 58375 sec elapsed[23] train-logloss:-nan train-rmse:3.13445 round 23:[ 2466] 70048 sec elapsed[24] train-logloss:-nan train-rmse:2.98958

start from 0024.model layer:fc7 wmat:eta = 0.00001 bias:eta = 0.00002 layer:fc8 wmat:eta = 0.00001 bias:eta = 0.00002

round 24:[ 2466] 11671 sec elapsed[25] train-logloss:-nan train-rmse:3.27728 round 25:[ 2466] 23347 sec elapsed[26] train-logloss:-nan train-rmse:2.95055 round 26:[ 2466] 35017 sec elapsed[27] train-logloss:-nan train-rmse:2.65933 round 27:[ 2466] 46689 sec elapsed[28] train-logloss:-nan train-rmse:2.35525 round 28:[ 2466] 58361 sec elapsed[29] train-logloss:-nan train-rmse:2.04922 round 29:[ 2466] 70034 sec elapsed[30] train-logloss:-nan train-rmse:1.72671

start from 0030.model layer:fc7 wmat:eta = 0.000001 bias:eta = 0.000002 layer:fc8 wmat:eta = 0.000001 bias:eta = 0.000002

round 30:[ 2466] 11675 sec elapsed[31] train-logloss:-nan train-rmse:2.81689 round 31:[ 2466] 23350 sec elapsed[32] train-logloss:-nan train-rmse:2.46264 round 32:[ 2466] 35021 sec elapsed[33] train-logloss:-nan train-rmse:2.16123 round 33:[ 2466] 46691 sec elapsed[34] train-logloss:-nan train-rmse:1.86558 round 34:[ 2466] 58362 sec elapsed[35] train-logloss:-nan train-rmse:1.58915 round 35:[ 2466] 70034 sec elapsed[36] train-logloss:-nan train-rmse:1.32312

— Reply to this email directly or view it on GitHub https://github.com/dmlc/cxxnet/issues/194#issuecomment-121099694.