Closed umarjibrilmohd closed 7 months ago
I also got this installation conflict while installing torch.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchtext 0.8.1 requires torch==1.7.1, but you have torch 1.8.0 which is incompatible. Successfully installed torch-1.8.0 torchaudio-0.8.0 torchvision-0.9.0
im using python 3.8 with the following details
"/home/mohammed/model/new ss/venv/bin/python" /home/mohammed/sssegmentation/ssseg/set.py CUDA Available: True CUDA Version: 10.2 PyTorch Version: 1.8.0
Process finished with exit code 0
please refer to official document to install Pytorch
mohammed@c24032:~/sssegmentation$ bash scripts/dist_train.sh 4 ssseg/configs/annnet/annnet_resnet50os16_ade20k.py
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
2024-01-11 10:43:57 WARNING ngpus_per_node is not equal to nproc_per_node, force ngpus_per_node = nproc_per_node by default
Traceback (most recent call last):
File "ssseg/train.py", line 253, in
still during the training
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:05:00.0 Off | N/A | | 0% 46C P8 21W / 250W | 3MiB / 11019MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A | | 0% 48C P8 20W / 250W | 3MiB / 11019MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ im having 2 gpu on my remote server
please set num_gpus as 2, like bash scripts/dist_train.sh 2 xxx.config
this could be related to memory issue, where can find the batch size to reduce it?
after solving the memory problem, i will like use my own single class dataset containing only images and annotations, how am i going to solve the txt files (objectInfo150.txt and sceneCategories.txt)?
i have 2 questions above
Failures:
please what could be the course of this error and how to address it?
im on a remote server with large memory capacity
usually i resolve this issue by reducing the batch size which i could not find here.
you can modify the batch size like,
SEGMENTOR_CFG['dataloader']['expected_total_train_bs_for_assert'] = 2
note that, you should adjust the learning rate accordingly if you adjust the total bs
i modified the BS from default dataloader and LR from basescheduler but the above error didt change.
'RandomCrop': {'crop_size': (256, 256)} 'Padding': {'output_size': (256, 256)}
the error has gone by adjusting to this, thanks for your giuude. i will still get back.
2024-01-21 13:29:20 INFO Config file path: /home/mohammed/sssegmentation/ssseg/configs/annnet/annnet_resnet50os16_ade20k.py
2024-01-21 13:29:20 INFO Config details:
{'type': 'ANNNet', 'num_classes': 1, 'benchmark': True, 'align_corners': False, 'backend': 'nccl', 'work_dir': 'annnet_resnet50os16_ade20k', 'logfilepath': 'annnet_resnet50os16_ade20k/umar_annnet_resnet50os16_ade20k.log', 'log_interval_iterations': 50, 'eval_interval_epochs': 10, 'save_interval_epochs': 1, 'resultsavepath': 'annnet_resnet50os16_ade20k/umar_annnet_resnet50os16_ade20k_results.pkl', 'norm_cfg': {'type': 'SyncBatchNorm'}, 'act_cfg': {'type': 'ReLU', 'inplace': True}, 'backbone': {'type': 'ResNet', 'depth': 50, 'structure_type': 'resnet50conv3x3stem', 'pretrained': False, 'outstride': 16, 'use_conv3x3_stem': True, 'selected_indices': (2, 3)}, 'head': {'in_channels_list': [1024, 2048], 'transform_channels': 256, 'query_scales': (1,), 'feats_channels': 512, 'key_pool_scales': (1, 3, 6, 8), 'dropout': 0.1}, 'auxiliary': {'in_channels': 1024, 'out_channels': 512, 'dropout': 0.1}, 'losses': {'loss_aux': {'type': 'CrossEntropyLoss', 'scale_factor': 0.4, 'ignore_index': 255, 'reduction': 'mean'}, 'loss_cls': {'type': 'CrossEntropyLoss', 'scale_factor': 1.0, 'ignore_index': 255, 'reduction': 'mean'}}, 'inference': {'mode': 'whole', 'opts': {}, 'tricks': {'multiscale': [1], 'flip': False, 'use_probs_before_resize': False}}, 'scheduler': {'type': 'PolyScheduler', 'max_epochs': 130, 'power': 0.9, 'optimizer': {'type': 'SGD', 'lr': 0.01, 'momentum': 0.9, 'weight_decay': 0.0005, 'params_rules': {}}}, 'dataset': {'type': 'ADE20kDataset', 'rootdir': '/home/mohammed/sssegmentation/ADE20k', 'train': {'set': 'train', 'data_pipelines': [('Resize', {'output_size': (2048, 512), 'keep_ratio': True, 'scale_range': (0.5, 2.0)}), ('RandomCrop', {'crop_size': (256, 256), 'one_category_max_ratio': 0.75}), ('RandomFlip', {'flip_prob': 0.5}), ('PhotoMetricDistortion', {}), ('Normalize', {'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375]}), ('ToTensor', {}), ('Padding', {'output_size': (256, 256), 'data_type': 'tensor'})]}, 'test': {'set': 'val', 'data_pipelines': [('Resize', {'output_size': (2048, 512), 'keep_ratio': True, 'scale_range': None}), ('Normalize', {'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375]}), ('ToTensor', {})]}}, 'dataloader': {'expected_total_train_bs_for_assert': 2, 'auto_adapt_to_expected_train_bs': True, 'train': {'batch_size_per_gpu': 2, 'num_workers_per_gpu': 2, 'shuffle': True, 'pin_memory': True, 'drop_last': True}, 'test': {'batch_size_per_gpu': 1, 'num_workers_per_gpu': 2, 'shuffle': False, 'pin_memory': True, 'drop_last': False}}}
2024-01-21 13:29:20 INFO Resume from:
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [992,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [993,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [994,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [995,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [581,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [582,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [583,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [584,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [585,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [586,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [587,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [588,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [589,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [590,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [591,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [592,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [593,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [825,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [826,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [827,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [828,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [829,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [830,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [831,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [971,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [972,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [973,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [974,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [975,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [976,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [977,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [978,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [979,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [980,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [981,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [982,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [983,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [984,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [985,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [986,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [987,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [988,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [989,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [990,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [991,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [320,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [321,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [322,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [323,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [324,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [325,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [326,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [327,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [328,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [329,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [330,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [331,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [332,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [333,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [334,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [335,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [336,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [337,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [569,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [570,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [571,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [572,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [573,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [574,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [575,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [57,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [58,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [59,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [60,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [61,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [62,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [63,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [459,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [460,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [218,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [219,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [220,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [221,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [222,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [223,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [957,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [958,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [959,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [701,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [702,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [703,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [189,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [190,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [191,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [445,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [446,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/ATen/native/cuda/NLLLoss2d.cu:95: nll_loss2d_forward_kernel: block: [0,0,0], thread: [447,0,0] Assertion t >= 0 && t < n_classes
failed.
Traceback (most recent call last):
File "ssseg/train.py", line 261, in
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1592929 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 1592930) of binary: /home/mohammed/miniconda3/envs/myenv/bin/python Traceback (most recent call last):
this is the error im getting using my custom dataset on ade20k, can you help me to address it?
please set num_classes as 2, since you are using cross entroy loss
(venv) mohammed@c24032:~/sssegmentation$ bash scripts/dist_train.sh 4 ssseg/configs/annnet/annnet_resnet50os16_ade20k.py
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
2024-01-10 08:33:59 WARNING ngpus_per_node is not equal to nproc_per_node, force ngpus_per_node = nproc_per_node by default Traceback (most recent call last): File "ssseg/train.py", line 252, in
main()
File "ssseg/train.py", line 247, in main
client.start()
File "ssseg/train.py", line 70, in start
torch.cuda.set_device(cmd_args.local_rank)
File "/home/mohammed/model/new ss/venv/lib/python3.8/site-packages/torch/cuda/init.py", line 261, in set_device
torch._C._cuda_setDevice(device)