RuntimeError: CUDA error: device-side assert triggered when testing trained custom data

hello, I tried to test self-train weight using vit base, and here is following errors:

Traceback (most recent call last): File "test.py", line 70, in num_query) File "/home/pengyuzhou/workspace/TransReID/processor/processor.py", line 162, in do_inference feat = model(img, cam_label=camids, view_label=target_view) File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/pengyuzhou/workspace/TransReID/model/make_model.py", line 310, in forward features = self.base(x, cam_label=cam_label, view_label=view_label) File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 414, in forward x = self.forward_features(x, cam_label, view_label) File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 402, in forward_features x = blk(x) File "/home/pengyuzhou/miniconda3/envs/transreid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/pengyuzhou/workspace/TransReID/model/backbones/vit_pytorch.py", line 190, in forward x = x + self.drop_path(self.mlp(self.norm2(x))) RuntimeError: CUDA error: device-side assert triggered

and here is the training configure file:

vit_base.yml MODEL: PRETRAIN_CHOICE: 'imagenet' PRETRAIN_PATH: '/home/pengyuzhou/.cache/torch/jx_vit_base_p16_224-80ecf9dd.pth' METRIC_LOSS_TYPE: 'triplet' IF_LABELSMOOTH: 'off' IF_WITH_CENTER: 'no' NAME: 'transformer' NO_MARGIN: True DEVICE_ID: ('1') TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID' STRIDE_SIZE: [16, 16]

INPUT: SIZE_TRAIN: [256, 128] SIZE_TEST: [256, 128] PROB: 0.5 # random horizontal flip RE_PROB: 0.5 # random erasing PADDING: 10 PIXEL_MEAN: [0.5, 0.5, 0.5] PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS: NAMES: ('dukemtmc') ROOT_DIR: ('/home/pengyuzhou/workspace/TransReID/data')

DATALOADER: SAMPLER: 'softmax_triplet' NUM_INSTANCE: 4 NUM_WORKERS: 8

SOLVER: OPTIMIZER_NAME: 'SGD' MAX_EPOCHS: 120 BASE_LR: 0.008 IMS_PER_BATCH: 256 WARMUP_METHOD: 'linear' LARGE_FC_LR: False CHECKPOINT_PERIOD: 9 LOG_PERIOD: 50 EVAL_PERIOD: 120 WEIGHT_DECAY: 1e-4 WEIGHT_DECAY_BIAS: 1e-4 BIAS_LR_FACTOR: 2

TEST: EVAL: True IMS_PER_BATCH: 128 RE_RANKING: False WEIGHT: 'output.pt' NECK_FEAT: 'before' FEAT_NORM: 'yes'

OUTPUT_DIR: '/home/pengyuzhou/workspace/TransReID/logs'

here is test configure file:

vit_transreid.yml MODEL: PRETRAIN_CHOICE: 'imagenet' PRETRAIN_PATH: '/home/pengyuzhou/.cache/torch/jx_vit_base_p16_224-80ecf9dd.pth' METRIC_LOSS_TYPE: 'triplet' IF_LABELSMOOTH: 'off' IF_WITH_CENTER: 'no' NAME: 'transformer' NO_MARGIN: True DEVICE_ID: ('3') TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID' STRIDE_SIZE: [16, 16] SIE_CAMERA: True SIE_COE: 3.0 JPM: True RE_ARRANGE: True

INPUT: SIZE_TRAIN: [256, 128] SIZE_TEST: [256, 128] PROB: 0.5 # random horizontal flip RE_PROB: 0.5 # random erasing PADDING: 10 PIXEL_MEAN: [0.5, 0.5, 0.5] PIXEL_STD: [0.5, 0.5, 0.5]

DATASETS: NAMES: ('dukemtmc') ROOT_DIR: ('/home/pengyuzhou/workspace/TransReID/data')

DATALOADER: SAMPLER: 'softmax_triplet' NUM_INSTANCE: 4 NUM_WORKERS: 8

SOLVER: OPTIMIZER_NAME: 'SGD' MAX_EPOCHS: 120 BASE_LR: 0.008 IMS_PER_BATCH: 256 WARMUP_METHOD: 'linear' LARGE_FC_LR: False CHECKPOINT_PERIOD: 120 LOG_PERIOD: 50 EVAL_PERIOD: 120 WEIGHT_DECAY: 1e-4 WEIGHT_DECAY_BIAS: 1e-4 BIAS_LR_FACTOR: 2

TEST: EVAL: True IMS_PER_BATCH: 1 RE_RANKING: False WEIGHT: '/home/pengyuzhou/workspace/TransReID/logs/transformer_27.pth' NECK_FEAT: 'before' FEAT_NORM: 'yes'

OUTPUT_DIR: /home/pengyuzhou/workspace/TransReID/logs/duke_vit_transreid'

How to solve this? Many thanks.

damo-cv / TransReID

RuntimeError: CUDA error: device-side assert triggered when testing trained custom data #56