Closed bao18 closed 5 years ago
Your label image should be a single channel image, instead of 3-channel.
@hangzhaomit , thanks for your quick reply, Yes, I also figured it out at last. Now I have the following error, this time it looks GPU problem
Traceback (most recent call last): File "train.py", line 273, in <module> main(cfg, gpus) File "train.py", line 200, in main train(segmentation_module, iterator_train, optimizers, history, epoch+1, cfg) File "train.py", line 42, in train loss = loss.mean() RuntimeError: CUDA error: device-side assert triggered /opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [388,0,0] Assertion
t >= 0 && t < n_classesfailed.
this last part repeats a lot of times..
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [388,0,0] Assertion
t >= 0 && t < n_classes` failed.``
In the default setup, label=0 is ignored. So if you have two classes, please set them as 1 and 2.
Thanks a lot! The code is running now.
@bao18 I am jeewa here. Did you able to train and validate the model with your own data set successfully? I have encountered a dimension mismatch error during validation phase. I have checked the dimensions of my ims_shape, seg_color and pred_color dimensions and found to beas follows. imgshape:(512, 512, 3) seg_color shape:(1, 512, 3) pred_color shape:(1, 512, 3)
I have only amended the odgt files, config files acording to my own data set along with GPU configurations. could you please have your comments on where I made the mistake. I will highly appreciate if you can let me know any other amendments I should made for a custom data set.
@jeewa985 Yes, I was able to train and validate on my own dataset. Basically, what I was doing wrong was: 1) saving the mask images (data/../annotations/training/***.png) in 3-channels images. These files should be 1-channels images. 2) For the same images, I was using 0 and 1 for labels. Since 0 is not recognize, the labels should be 1 and 2 for two classes problem. I hope these tips help you to run the model. Best.
@bao18 thank you very much for your quick respone. In fact I did run the model for my custom data set. But I have encountered a probel during the evelaution phase as follws.
ValueError: all the input array dimensions except for the concatenation axis must match exactly”
I have checked for the shape of the img, seg_color and predict_color and found that those are not match each other.
I would like to know from you that, did you made any canges to the model when you use it for your own data set.
my data set has 03 classeses (assigned for 1,2,3) indexes and 100 images for training and 20 images for eveluation
Looking foward for a hearing from you. Best regards
@bao18 out of curiosity , was your 0 label background, or a class you a specific object?
@bao18 how did you change the config options to solve this problem?
When trying to train the model by the command below, a RuntimeError occurred, it seems that some problems with the GPUs (four GPU).
command I run
the command I run:
python train.py --gpus 0,1,2,3 --cfg $cfg
Error: