Closed Muaz65 closed 3 years ago
Did you create your training images with createTrainIdLabelImgs.py? I get the same error when I set json2labelImg( f , dst , "trainIds" ) there, then it works and the error is gone.
Hi @Muaz65,
Thank you for your interest in our work!
Previously when I faced this error, usually it was due to the mismatch between the model's output dimension for classes v.s. the range of integers in the ground truth files.
Therefore, in your case, one thing you could check is 1) whether you set your model output dimension as six; 2) if your ground truth files only contain integers from 0 to 5.
Hope that helps.
i think range of integers in the ground truth file is the issue here. I ll' confirm it and let you know. ThankYou!
I am trying to train FasterSeg for a custom dataset with six classes. I have formatted the annotations and written datasets class just like cityscapes.py. I am having issue while Pretraining the supernet (Section 1.1 in readMe.md)
CUDA Version: 10.2 torchvision : 0.3.0
torch : 1.1.0
Traceback (most recent call last): File "train_search.py", line 304, in
main(pretrain=config.pretrain)
File "train_search.py", line 133, in main
train(pretrain, train_loader_model, train_loader_arch, model, architect, ohem_criterion, optimizer, lr_policy, logger, epoch, update_arch=update_arch)
File "train_search.py", line 243, in train
loss = model._loss(imgs, target, pretrain)
File "/home/soccer/Desktop/Muaz/FasterSeg/search/model_search.py", line 491, in _loss
loss = loss + sum(self._criterion(logit, target) for logit in logits)
File "/home/soccer/Desktop/Muaz/FasterSeg/search/model_search.py", line 491, in
loss = loss + sum(self._criterion(logit, target) for logit in logits)
File "/home/soccer/anaconda3/envs/pipeline_cloned/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/home/soccer/Desktop/Muaz/FasterSeg/tools/seg_opr/loss_opr.py", line 81, in forward
index = mask_prob.argsort()
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered**
Before this error i get a number of CUDA errors but that doesn't crash the code
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [2,0,0], thread: [127,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [2,0,0], thread: [61,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.NOTE: I recreated the experiment on citspscapes dataset and i am still encountering the same issue.