Open samxu29 opened 3 months ago
Hello, I am running into some issues regarding to pre-training of the model. We are planning on use this model to train some custom datasets, so I was tasked to try this model out. I have downloaded ADE20K and organized in following file structure.
├── train │ ├── rgb │ │ └── all │ └── ADE_train_00000001.jpg │ └── semseg │ └── all │ └── ADE_train_00000001.png └── val ├── rgb │ └── all │ └── ADE_val_00000001,.jpg └── semseg └── all └── ADE_val_00000001.png
and I try to run
python3 run_pretraining_multimae.py \ --config cfgs/pretrain/multimae-b_98_rgb+-depth-semseg_400e.yaml \ --data_path dataset/mutlimae/train
I keep getting error as below
Not using distributed mode Creating model: pretrain_multimae_base for inputs ['rgb', 'semseg'] and outputs ['rgb', 'semseg'] Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7e731ff62a90> Namespace(all_domains=['rgb', 'semseg'], alphas=1.0, auto_resume=True, balancer_lr_scale=1.0, batch_size=8, blr=0.0001, clip_grad=None, data_path='/home/sxu7/dataset/mutlimae/train', decoder_decay=None, decoder_depth=2, decoder_dim=256, decoder_num_heads=8, decoder_use_task_queries=True, decoder_use_xattn=True, device='cuda', dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.0, epochs=400, extra_norm_pix_loss=True, find_unused_params=True, fp32_output_adapters='semseg', hflip=0.5, imagenet_default_mean_and_std=True, in_domains=['rgb', 'semseg'], input_size=224, local_rank=-1, log_wandb=False, loss_on_unmasked=False, min_lr=0.0, model='pretrain_multimae_base', momentum=0.9, num_encoded_tokens=98, num_global_tokens=1, num_workers=10, opt='adamw', opt_betas=[0.9, 0.95], opt_eps=1e-08, out_domains=['rgb', 'semseg'], output_dir='output/pretrain/multimae-b_98_rgb+-depth-semseg_400e_custom', patch_size=16, pin_mem=True, resume='', sample_tasks_uniformly=False, save_ckpt_freq=20, seed=0, show_user_warnings=False, skip_grad=None, standardize_depth=False, start_epoch=0, task_balancer='none', train_interpolation='bicubic', wandb_entity=None, wandb_project='multimae-pretrain_custom', wandb_run_name='multimae-b_98_rgb+-depth-semseg_400e_custom', warmup_epochs=40, warmup_lr=1e-06, warmup_steps=-1, weight_decay=0.05, weight_decay_end=None, world_size=1) Model = MultiMAE( ... ) Number of params: 95.085456 M LR = 0.00000313 Batch size = 8 Number of training steps = 2526 Number of training examples per epoch = 20208 optimizer settings: {'lr': 3.125e-06, 'weight_decay': 0.05, 'eps': 1e-08, 'betas': [0.9, 0.95]} /home/sxu7/hpe/MultiMAE/utils/native_scaler.py:19: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self._scaler = torch.cuda.amp.GradScaler(enabled=enabled) Use step level LR & WD scheduler! Set warmup steps = 101040 Set warmup steps = 0 Max WD = 0.0500000, Min WD = 0.0500000 Auto resume checkpoint: Start training for 400 epochs ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [46,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [507,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed. ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [25,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed. Traceback (most recent call last): File "run_pretraining_multimae.py", line 586, in <module> main(opts) File "run_pretraining_multimae.py", line 414, in main train_stats = train_one_epoch( File "run_pretraining_multimae.py", line 502, in train_one_epoch preds, masks = model( File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/home/sxu7/hpe/MultiMAE/multimae/multimae.py", line 312, in forward input_task_tokens = { File "/home/sxu7/hpe/MultiMAE/multimae/multimae.py", line 313, in <dictcomp> domain: self.input_adapters[domain](tensor) File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/home/sxu7/hpe/MultiMAE/multimae/input_adapters.py", line 229, in forward x = rearrange(self.class_emb(x), 'b nh nw c -> b c nh nw') File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 164, in forward return F.embedding( File "/home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2267, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7e7410d77f86 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7e7410d26d10 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7e7411150ee8 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame #3: <unknown function> + 0xffcc98 (0x7e7390dfcc98 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0x10055ad (0x7e7390e055ad in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #5: <unknown function> + 0x5db8c0 (0x7e740f3db8c0 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: <unknown function> + 0x6abdf (0x7e7410d5bbdf in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libc10.so) frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7e7410d54c3b in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libc10.so) frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7e7410d54de9 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libc10.so) frame #9: <unknown function> + 0x892028 (0x7e740f692028 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #10: THPVariable_subclass_dealloc(_object*) + 0x2f6 (0x7e740f6923a6 in /home/sxu7/hpe/MultiMAE/.venv/lib/python3.8/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame #23: <unknown function> + 0x29d90 (0x7e7414a29d90 in /lib/x86_64-linux-gnu/libc.so.6) frame #24: __libc_start_main + 0x80 (0x7e7414a29e40 in /lib/x86_64-linux-gnu/libc.so.6) [1] 1891932 IOT instruction (core dumped) CUDA_LAUNCH_BLOCKING=1 python3 run_pretraining_multimae.py --config
Is the error realted to the CUDA or something to do with the code base?
Maybe there is something wrong with the configuration of the yaml file, in_domains should be rgb-semseg
Hello, I am running into some issues regarding to pre-training of the model. We are planning on use this model to train some custom datasets, so I was tasked to try this model out. I have downloaded ADE20K and organized in following file structure.
and I try to run
I keep getting error as below
Is the error realted to the CUDA or something to do with the code base?