Jiangfeng-Xiong / satellite_seg

CCF BCDI2017 卫星影像的AI分类与识别
153 stars 66 forks source link

关于更改了batch_size 的一些疑问 #2

Closed fangxu622 closed 5 years ago

fangxu622 commented 6 years ago

我使用的是python 3 windows pytorch3.1,,显卡为1080ti 11G

python train.py --arch pspnet-densenet-s1s2-crf2 --img_rows 256 --img_cols 256 --n_epoch 50 --l_rate 1e-3 --batch_size 32 --gpu 0 --step 50 --traindir "dataset/stage1&stage2-train-crf2"

当batch_size =32的时候,提示内存不足· image

当batch_size 改为4 或者8 ,提示了以下错误

D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
epoch 0 with learning rate: 0.001000
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
D:\Anaconda3\envs\fangxu\lib\site-packages\matplotlib\colors.py:680: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
  not cbook.is_string_like(colors[0]):
E:\fxworkspace\satellite_seg\utils\loss.py:16: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an arg
ument.
  log_p = F.log_softmax(input)
D:\Anaconda3\envs\fangxu\lib\site-packages\torch\autograd\_functions\tensor.py:465: UserWarning: self and mask not broadcastable, but have the same number of element
s.  Falling back to deprecated pointwise behavior.
  return tensor.masked_select(mask)
Traceback (most recent call last):
  File "train.py", line 193, in <module>
    train(args)
  File "train.py", line 152, in train
    loss.backward()
  File "D:\Anaconda3\envs\fangxu\lib\site-packages\torch\autograd\variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "D:\Anaconda3\envs\fangxu\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
  File "D:\Anaconda3\envs\fangxu\lib\site-packages\torch\autograd\function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "D:\Anaconda3\envs\fangxu\lib\site-packages\torch\autograd\_functions\tensor.py", line 481, in backward
    grad_tensor = grad_tensor.masked_scatter(mask, grad_output)
  File "D:\Anaconda3\envs\fangxu\lib\site-packages\torch\autograd\variable.py", line 427, in masked_scatter
    return self.clone().masked_scatter_(mask, variable)
RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at c:\anaconda2\conda-bld\pytorch_1
519501749874\work\torch\lib\thc\generic/THCTensor.c:326

RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at c:\anaconda2\conda-bld\pytorch_1

fangxu622 commented 6 years ago

我不确定这个问题是代码的问题 还是pytorch bug的问题 @Jiangfeng-Xiong

Jiangfeng-Xiong commented 6 years ago

我也改过batchsize,但是没出现过这个问题。最大可能是不同版本的问题,我使用的环境是python2.7,Ubuntu, pytorch 3.0

fangxu622 commented 6 years ago

@Jiangfeng-Xiong 我今天换到了centos 7 ,python 2.7 pytorch 0.3.1 还是一样的问题,由于是cuda 9没办法换到0.3.0~ 难道是0.3.0 与0.3.1 的差别·~?

(python27) bash-4.2$ ./run_train.sh 
epoch 0 with learning rate: 0.001000
/home/sensetime/test/satellite_seg/utils/loss.py:16: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  log_p = F.log_softmax(input)
/home/sensetime/test/anaconda2/envs/python27/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py:465: UserWarning: self and mask not broadcastable, but have the same number of elements.  Falling back to deprecated pointwise behavior.
  return tensor.masked_select(mask)
Traceback (most recent call last):
  File "train.py", line 193, in <module>
    train(args)
  File "train.py", line 152, in train
    loss.backward()
  File "/home/sensetime/test/anaconda2/envs/python27/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/sensetime/test/anaconda2/envs/python27/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
  File "/home/sensetime/test/anaconda2/envs/python27/lib/python2.7/site-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "/home/sensetime/test/anaconda2/envs/python27/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 481, in backward
    grad_tensor = grad_tensor.masked_scatter(mask, grad_output)
  File "/home/sensetime/test/anaconda2/envs/python27/lib/python2.7/site-packages/torch/autograd/variable.py", line 427, in masked_scatter
    return self.clone().masked_scatter_(mask, variable)
RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at /opt/conda/conda-bld/pytorch_1518238441757/work/torch/lib/THC/generic/THCTensor.c:326
Jiangfeng-Xiong commented 6 years ago

可能是,pytorch已经更新挺多了,可能跟这个问题有关。你试试用github上最新的pytorch源码编译安装试试 @fangxu622

fangxu622 commented 6 years ago

@Jiangfeng-Xiong 我从最新源码编译了一次~但是在cross_entropy2d 函数里面提示了这样一个错误,我目前想把你这个代码跑通然后用自己的数据·· File "/home/sensetime/test/satellite_seg3/utils/loss.py", line 20, in cross_entropy2d log_p = log_p[target.view(n, h, w, 1).repeat(1, 1, 1, c) >= 0] IndexError: too many indices for tensor of dimension 2

epoch 0 with learning rate: 0.001000
/home/sensetime/test/anaconda2/envs/py3/lib/python3.6/site-packages/torch/nn/functional.py:1762: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sensetime/test/satellite_seg3/utils/loss.py:18: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  log_p = F.log_softmax(input)
Traceback (most recent call last):
  File "train.py", line 193, in <module>
    train(args)
  File "train.py", line 149, in train
    loss = cross_entropy2d(outputs, labels,weights_per_class)
  File "/home/sensetime/test/satellite_seg3/utils/loss.py", line 20, in cross_entropy2d
    log_p = log_p[target.view(n, h, w, 1).repeat(1, 1, 1, c) >= 0]
IndexError: too many indices for tensor of dimension 2

image 总感觉这个地方的代码维度好像不对·~

目前我把 cross_entropy2d函数改了一下,不知道有没有改变原来的意思·~~~ image

fangxu622 commented 6 years ago

还是确定 cross_entropy2d 函数有问题 维度不对··~~,然后visdom的画线代码 X,Y 的维度是否添加错了

在train.py 里面

vis.line我修改如下:看是否正确,否则会提示Y必须是一维,且X,Y 维度必须相同

            vis.line(
                 X=torch.ones((1)).cpu()*iter,
                 Y=torch.Tensor([loss.data[0]]).cpu(),
                 win=loss_window,
                 update='append')
zhanhuanli commented 6 years ago

你好,我想请问你代码调通了可以正常训练了吗?最近我也碰到你类似的问题~

wmf1991yeah commented 4 years ago

您好,我也遇到了类似的问题,请问你们是怎么解决的?