Closed Sucran closed 4 years ago
Yes I also met memory leak problem. In my case, my GPU's VRAM is 11 GB, and It spends about 9GB of VRAM for batch size 1. (in paper, batchsize 10 is recommended but I could not train the model with that option) (for 3 * 3 convolution version)
May be 1 * 1 convolution version(branch adjusted) would spend less memory.
I think that memory leak comes from local Picanet's implementation. It makes H W C Tensor to H W number of 14 14 * C size patches. If you have better idea to implement local Picanet, please comment here or make a pull request.
Since I am not the author of paper, this code is not the best implementation. I'm sorry for that.
Yes, I noticed the batch size option, it is wired and strange. I have no better idea so far, but I hope for further discussion. This week, I will go through the caffe code from author and see the difference in implementation in pytorch and caffe, going deeper in local PiCANet and global PiCANet.
Can you give me the link of caffe implementation? I didn't know that. Thanks.
@Ugness https://github.com/nian-liu/PiCANet, with deeplab caffe version.
Thanks a lot.
@Ugness I change the PiCANet config from ‘GGLLL’ to ‘GGGGG’ and ‘LLLLL’, both of them have memory leak problem when running network.py, have you met this before? I also found an interesting implementation of authors caffe code, they seem implemented an attpooling function on their own proto cpp which support their global or local attention function like conv3d. Can you give me a hint on how you thinking about the conv3d processing?
I think that would not work with 'GGGGG' or 'LLLLL'. I just tested with 'GGLLL' and other options may cause some tensor dimension error.
And I will check protocpp ASAP. For my conv3d processing, it is not easy to describe with text only. :( I will describe with text first, but if you need more information to help your understanding, I'll make some images ASAP.
(1, NxHxW, C, 13, 13)
-> for F.conv3d, each dimension means (batch, channel, depth, H, W)
On kernel side, my idea is each (1,1,7,7)
kernel goes to (1,1,13,13)
by using F.conv3d dilation option.
Then, F.conv3d will apply NxHxW
number of kernels to NxHxW
number of patchs.
It is possible by using groups
option
Also, F.conv3d will across the depth dimension(C, dim:2) with same att_map.
Finally, the output is (1, NxHxW, C, 1, 1)
attention applied feature map, so I can reshape it to (N, C, H, W)
I used same idea to local PiCANet.
X_X
I thought Caffe is simillar to pytorch, but it wasn't.
I tried to read the code, but I can't. The only thing I can see is they used for loop.
If they used for loop for implementing PiCANet, for loop in python consumes a lot of time. without CUDA logic. And I don't know how to use CUDA for loop in python. T.T
@Ugness I do not think they use loop for implementing PiCANet. They use im2col and col2im, which is torch.nn.Unfold and torch.nn.Fold in pytorch. I suppose Conv3d can be translated into a combination of several im2col + matric multiplication + col2im, but I still confused how to implement this, still working on it. The memory leak problem we suffered seems caused by F.Conv3d, hoping next version would fix it.
Thanks. I also try to convert conv3d operation to combination of matrix multiplication.
@Sucran I think I can improve my model soon. There was no such function like torch.nn.Fold on pytorch 0.4.0 when I started this project. Now, I found the function that I need. Thanks.
Oh, really?Amazing! @Ugness You are such a genius. Looking forward to your new version. Thanks for your work, again.
Hi @Sucran I made a new logic! You can check it on https://github.com/Ugness/PiCANet-Implementation/tree/Fold_Unfold Now you can train PiCANet model with batch1, by using 3.5GB of VRAM. I just started my training code, so I'll report the training result about next week!
Looks like it works!
@Ugness Soooooo happy for it works! I check the branch of Fold_Unfold, the memory leak problem seems gone. The VRAM is also lower for increasing the batch size, but cannot be 10. I will check the channel setting of each layer by comparing the caffe version of the author, maybe there is something misunderstanding still exits.
@Sucran Thanks a lot for your interest. It gave a lot of improvement. It seems like training speed is also improved. About the version of code, Fold_Unfold version is branch of origin (33 conv) not the Adjusted(11 conv) one. I am training this code with 3*3 conv, batch_size 4. I am going to close this Issue after report the result. If you find some errors or need help, please open another issue. :)
@Ugness Ok. Thanks for your work again. It is my pleasure.
@Ugness Anything new?
One of my model got about 88 on F-measure score with 200 samples of DUTS-TE which scored 87 with model in paper, So I am measuring score with all of DUTS-TE, on all of checkpoints. So it takes a little bit long time.
I ensure that new model(with bigger batch_size) performs much better. I think I can update repo on Sunday or next Monday.
I updated and merged branch.
@Ugness So the result is the branch of origin (33 conv) not the Adjusted(11 conv) one? it seems to increase the performance of the author's version? The curve you plot is corresponding to training or validation?
No, it's adjusted one. I used (1*1 conv). Yes it seems making better performance. The curve is validation.
I think I need to check all of the code hardly. May be there is something wrong.
@Ugness Hi, I try to reproduce your result, but I am confusing how to compute the metric result you reported. I had a trained weight model, but which code file contains the test part code?
You can check the measuring code in pytorch/measure_test.py. It will report the result on tensorboard, and you can download csv from tensorboard.
Hi @Ugness, do you check your test code for computing Max F_b and MAE, I think there are problems here. 1) The way of computing MAE which is different with MSE_Loss. It is torch.abs(y_pred - y).mean(). 2) I do not familiar with scikit-learn, maybe the pr_curve computing is more efficient than handcraft one. but I got a different result, I ref the code of AceCoooool/DSS-pytorch, I think the problem can be here. I using the trained model 36epo_383000step.ckpt and got a Max F_b as 0.877 for your code, but got 0.750 for AceCoooool's code.
For example, if threshold=0.7 and predicted value=0.8, I made 0.8 to 1. As like as making PR-curve. And I plot Max-F score on all threshold space. If I did not use threshold, maybe it scored 0.75 as like you get. Thank you for your comment. I also think 0.877 is strange. Because my attention map was different from author's one.
@Ugness I do not think the scikit-learn API provided a correct way to compute max F-beta, but you can ref the paper "Salient Object Detection: A Survey" for Chapter 3.2. Usually, we have a fixed threshold which changes from 0 to 255, for binarizing the saliency map to compute precision and recall. F-beta is computed from the average precision and average recall of all images. Then we pick the maximum as max F-beta.
Thanks. I'll check the paper.
I found that my threshold is not same as Survey Chapter 4.2. I am going to re-measure the f score as soon as possible. Thanks.
I found that sklearn uses more specific threshold bins than 0 to 255. sklearn uses all possible values of each pixel as threshold. So I used the evaluation code in DSS repo and fixed it little bit because it made NaN problem.
for i in range(256):
y_temp = (pred >= thlist[i]).float()
tp = (y_temp * mask).sum()
# avoid prec becomes 0
prec[i], recall[i] = (tp + 1e-10) / (y_temp.sum() + 1e-10), (tp + 1e-10) / (mask.sum() + 1e-10)
f_score = (1 + beta_square) * prec * recall / (beta_square * prec + recall)
print(torch.max(f_score))
Upper one is sklearn, and below is DSS_repo's evaluation method. I need to wait for a while to get a full result, but It seems there is no big diff between sklearn's one and DSS repo's one.
@Ugness Sorry, I found my test code had some mistake. I will report a new result these days of your model.
It's okay. Thanks.
I fix my test code,and I got the result of 36epo_383000step.ckpt on DUTS-TE average mae: 0.0757, max fmeasure: 0.7803 if I use denseCRF,the result is average mae: 0.0639, max fmeasure: 0.7886 on DUTS-TE all test image resizes into 224*224 because of the dimension limit of attention module. Hope you can check the answer.
Can I get a code snippet from your test code? and also Can I get the threshold value when you get the max fmeasure score?
Hi, I uploaded result csv on https://drive.google.com/drive/u/0/folders/1A9qXGuvtqwSY0mEc5hbC-4b7ix8fLyfA
the test code:
def eval_mae(self, y_pred, y):
return torch.abs(y_pred - y).mean()
def eval_pr(self, y_pred, y, num=100):
prec, recall = torch.zeros(num), torch.zeros(num)
thlist = torch.linspace(0, 1 - 1e-10, num)
for i in range(num):
y_temp = (y_pred >= thlist[i]).float()
tp = (y_temp * y).sum()
prec[i], recall[i] = tp / (y_temp.sum() + 1e-20), tp / y.sum()
return prec, recall
def test(self, use_crf=False):
if use_crf: from libs.dense_crf import crf
avg_mae, img_num = 0.0, len(self.test_loader.dataset)
avg_prec, avg_recall = torch.zeros(100), torch.zeros(100)
self.net.eval()
with torch.no_grad():
for i, data_batch in enumerate(self.test_loader):
images, labels = data_batch['image'], data_batch['label']
images, labels = images.to('cuda'), labels.to('cuda')
shape = labels.size()[2:]
new_shape = (shape[0] // 32) * 32, (shape[1] // 32) * 32
inputs = F.interpolate(images, size=new_shape, mode='bilinear', align_corners=True)
prob_pred = self.net(inputs)
prob_pred = torch.mean(torch.cat([prob_pred[i] for i in self.net.select], dim=1), dim=1, keepdim=True)
prob_pred = F.interpolate(prob_pred, size=shape, mode='bilinear', align_corners=True).to('cpu')
if use_crf:
prob_pred = crf(images, prob_pred.numpy(), to_tensor=True)
labels, prob_pred = labels.to('cpu'), prob_pred.to('cpu')
mae = self.eval_mae(prob_pred, labels)
prec, recall = self.eval_pr(prob_pred, labels)
print("[%d] mae: %.4f" % (i, mae))
avg_mae += mae
avg_prec, avg_recall = avg_prec + prec, avg_recall + recall
avg_mae, avg_prec, avg_recall = avg_mae / img_num, avg_prec / img_num, avg_recall / img_num
score = (1 + (0.3) ** 2) * avg_prec * avg_recall / ((0.3) ** 2 * avg_prec + avg_recall)
score[score != score] = 0 # delete the nan
print('average mae: %.4f, max fmeasure: %.4f' % (avg_mae, score.max()))
the crf code ref to AceCoooool/DSS-pytorch
I think that F.interpolate() made the difference. I simply resize all image to 224*224 when I load data from the dataset without maintaining their aspect ratio. If that resizing method is wrong, I think I need to train the model again with your resizing method. Please give me your comment. Thanks.
yes,for your network,I set the resize transformation into 224*224 in the dataset, this code is for my own network, my own network accepts all input size. So, I think the resizing is not work when I test your model. The BIG difference is the way of computing precision and recall, you can check it. The MAE is also strange, did you change your MAE code? Last time you said you made mistake in here.
Yes, I corrected MAE. Oh, I check the difference on precision and recall now. Sorry. I'll test it again.
@Ugness Sorry, I was busy on my own thing these days. Have you test it again and determine which is the right answer? Actually, I still can not run the code for a complete training phase since RAM was eaten up on my machine. The model I test is download on your link, I'm afraid the model is not the newest. Can you upload a new model for testing?
Sorry. now I started to test my code. I'll report the result ASAP
scored 0.8546. for MAE 0.05321
@Ugness It may be my mistake, could you tell me the model download link and the corresponding code of model definition?
https://drive.google.com/drive/folders/1A9qXGuvtqwSY0mEc5hbC-4b7ix8fLyfA
I think you tested with the latest model. And there is no update of model definition since Oct. 21. I'm still not sure how to calculate F-score correctly. I'll order my procedure and please check it.
For each threshold in linspace(0, 1, 256)
Corresponding code of measuring F-score is here.
for model in models:
for i, batch in enumerate(dataloader):
img = batch['image'].to(device)
mask = batch['mask'].to(device)
with torch.no_grad():
pred, loss = model(img, mask)
pred = pred[5].data
mae += torch.mean(torch.abs(pred - mask))
pred = pred.requires_grad_(False)
preds.append(pred)
masks.append(mask)
prec, recall = torch.zeros(mask.shape[0], 256), torch.zeros(mask.shape[0], 256)
pred = pred.squeeze(dim=1).cpu()
mask = mask.squeeze(dim=1).cpu()
thlist = torch.linspace(0, 1 - 1e-10, 256)
for j in range(256):
y_temp = (pred >= thlist[j]).float()
tp = (y_temp * mask).sum(dim=-1).sum(dim=-1)
# avoid prec becomes 0
prec[:, j], recall[:, j] = (tp + 1e-10) / (y_temp.sum(dim=-1).sum(dim=-1) + 1e-10), (tp + 1e-10) / (mask.sum(dim=-1).sum(dim=-1) + 1e-10)
# (batch, threshold)
precs.append(prec)
recalls.append(recall)
prec = torch.cat(precs, dim=0).mean(dim=0)
recall = torch.cat(recalls, dim=0).mean(dim=0)
f_score = (1 + beta_square) * prec * recall / (beta_square * prec + recall)
thlist = torch.linspace(0, 1 - 1e-10, 256)
writer.add_scalar("Max F_score", torch.max(f_score),
global_step=int(model_name.split('epo_')[1].split('step')[0]))
writer.add_scalar("Max_F_threshold", thlist[torch.argmax(f_score)],
global_step=int(model_name.split('epo_')[1].split('step')[0]))
And about your memory problem, how much VRAM and RAM do you have? Where does the RAM problem occur? RAM or VRAM?
@Ugness I think the F-score procedure code that you showed is correct. It is almost the same as I reported in 7 days ago, right? I just set the number of threshold as 100 and you set it as 256, which no cause too many differences, but the result still be 0.854 when you tested?
The most strange thing hit on my mind is the difference of MAE results. I always got a value of 0.65 but you got 0.54, we test it with the same code. Oh man, it is wired!
I also think it is strange. And I have a few questions to compare our results.
Thank you for answering. for 3., If you want, can you test with my threshold option? I already have it. It is 0.6627.
Hi, @Ugness I met a RAM memory leak problem when running network.py and train.py, this issue confused me for a few days. I have run other pytorch repo which is OK. I run the code in Ubuntu 14.04, Pytorch 0.4.1, CUDA 8.0, cudnn 6.0.