training with my custom dataset

GoEung commented 3 years ago

Hi. Thank you for sharing your code.

I'm trying to train the model with my custom dataset. The number of class is 3, so I changed the code in resnet38_SEAM.py

line 16 : self.fc8 = nn.Conv2d(4096, 4, 1, bias=False)

I just changed the dim and run the code, but error occurs. I thought that it's about the CUDA so I changed the batch size 2. But the result same.

I found that the error occurs when the loss is Nan. After some iterations, the loss_cls1 and loss_cls2 become Nan..

THCudaCheck FAIL file=C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMathPointwise.cu line=253 error=59 : device-side assert triggered
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [0,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [1,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [2,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [3,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [4,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THC/THCTensorScatterGather.cu:130: block: [0,0,0], thread: [5,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "C:/Users/Goeun/PycharmProjects/SEAM2/train_SEAM.py", line 144, in <module>
    loss.backward()
  File "C:\Users\Goeun\miniconda3\envs\seam\lib\site-packages\torch\tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\Goeun\miniconda3\envs\seam\lib\site-packages\torch\autograd\__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMathPointwise.cu:253

Process finished with exit code 1

WeiChihChern commented 3 years ago

For others' reference.

Might have found the reason. The default learning rate might be too large to your custom dataset. Results in making too much of adjustment, and the loss exploded.

Lower the learning rate and give it another try.

zzc-ai commented 2 years ago

您好，我也想用自己的数据进行训练，有一点很迷惑，希望您能指教。在数据集导入阶段，她的标签是voc的xml格式，那seam网络输入的标签是一张图片的类别还是有gt的boundbox呢

WeiChihChern commented 2 years ago

您好，我也想用自己的数据进行训练，有一点很迷惑，希望您能指教。在数据集导入阶段，她的标签是voc的xml格式，那seam网络输入的标签是一张图片的类别还是有gt的boundbox呢

SEAM only uses classification labels for training. No bounding box labels involved.

zzc-ai commented 2 years ago

首先非常感谢您的快速解答，我还想问您一个问题，就是您是怎么进行数据的label标注的，使用labelme吗？生成xml？

WeiChihChern commented 2 years ago

首先非常感谢您的快速解答，我还想问您一个问题，就是您是怎么进行数据的label标注的，使用labelme吗？生成xml？

If you look into the file in SEAM/voc12/cls_labels.npy, it contains a dictionary of example.jpg: [0, 1, 0, 1, 0, 0]. This is what you need as the ground truth for SEAM.

So you don't need to use Labelme for classification labels. However, if you want to get the mIoU scores to your custom dataset, you will need to provide polygon annotations of your dataset. In that case, you could use any annotation software to do that.

If I remember correctly, SEAM use ground truth masks for evaluation, so no xml needed.

duke023456 commented 2 years ago

首先非常感谢您的快速解答，我还想问您一个问题，就是您是怎么进行数据的label标注的，使用labelme吗？生成xml？

你好，请问你是如何利用自己的数据集进行训练的，训练集的格式与voc2012保持一致吗？

YudeWang / SEAM

training with my custom dataset #33