facebookresearch / Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
MIT License
2.45k stars 370 forks source link

error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device #53

Open 9p15p opened 2 years ago

9p15p commented 2 years ago

using /data Preparation done. Between equal marks is user's output: /root/conda/bin/python running build running build_py running build_ext building 'MultiScaleDeformableAttention' extension Emitting ninja build file /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. g++ -pthread -shared -B /root/conda/compiler_compat -L/root/conda/lib -Wl,-rpath=/root/conda/lib -Wl,--no-as-needed -Wl,--sysroot=/ /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/workspace/mask2former/modeling/pixel_decoder/ops/src/vision.o /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/workspace/mask2former/modeling/pixel_decoder/ops/src/cpu/ms_deform_attn_cpu.o /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/workspace/mask2former/modeling/pixel_decoder/ops/src/cuda/ms_deform_attn_cuda.o -L/root/conda/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.7/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so running install running bdist_egg running egg_info writing MultiScaleDeformableAttention.egg-info/PKG-INFO writing dependency_links to MultiScaleDeformableAttention.egg-info/dependency_links.txt writing top-level names to MultiScaleDeformableAttention.egg-info/top_level.txt reading manifest file 'MultiScaleDeformableAttention.egg-info/SOURCES.txt' writing manifest file 'MultiScaleDeformableAttention.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib creating build/bdist.linux-x86_64/egg creating build/bdist.linux-x86_64/egg/functions copying build/lib.linux-x86_64-3.7/functions/init.py -> build/bdist.linux-x86_64/egg/functions copying build/lib.linux-x86_64-3.7/functions/ms_deform_attn_func.py -> build/bdist.linux-x86_64/egg/functions copying build/lib.linux-x86_64-3.7/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg creating build/bdist.linux-x86_64/egg/modules copying build/lib.linux-x86_64-3.7/modules/ms_deform_attn.py -> build/bdist.linux-x86_64/egg/modules copying build/lib.linux-x86_64-3.7/modules/init.py -> build/bdist.linux-x86_64/egg/modules byte-compiling build/bdist.linux-x86_64/egg/functions/init.py to init.cpython-37.pyc byte-compiling build/bdist.linux-x86_64/egg/functions/ms_deform_attn_func.py to ms_deform_attn_func.cpython-37.pyc byte-compiling build/bdist.linux-x86_64/egg/modules/ms_deform_attn.py to ms_deform_attn.cpython-37.pyc byte-compiling build/bdist.linux-x86_64/egg/modules/init.py to init.cpython-37.pyc creating stub loader for MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so byte-compiling build/bdist.linux-x86_64/egg/MultiScaleDeformableAttention.py to MultiScaleDeformableAttention.cpython-37.pyc creating build/bdist.linux-x86_64/egg/EGG-INFO copying MultiScaleDeformableAttention.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO copying MultiScaleDeformableAttention.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying MultiScaleDeformableAttention.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying MultiScaleDeformableAttention.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt zip_safe flag not set; analyzing archive contents... pycache.MultiScaleDeformableAttention.cpython-37: module references file creating 'dist/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it removing 'build/bdist.linux-x86_64/egg' (and everything under it) Processing MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg removing '/root/conda/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg' (and everything under it) creating /root/conda/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg Extracting MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg to /root/conda/lib/python3.7/site-packages MultiScaleDeformableAttention 1.0 is already the active version in easy-install.pth

Installed /root/conda/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg Processing dependencies for MultiScaleDeformableAttention==1.0 Finished processing dependencies for MultiScaleDeformableAttention==1.0 run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET Command Line Args: Namespace(config_file='configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False) run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET [02/22 03:41:38 detectron2]: Rank of current process: 0. World size: 8 [02/22 03:41:40 detectron2]: Environment info:


sys.platform linux Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] numpy 1.19.2 detectron2 0.6 @/root/conda/lib/python3.7/site-packages/detectron2 Compiler GCC 7.3 CUDA compiler CUDA 11.1 detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6 DETECTRON2_ENV_MODULE PyTorch 1.9.0 @/root/conda/lib/python3.7/site-packages/torch PyTorch debug build False GPU available Yes GPU 0,1,2,3,4,5,6,7 GeForce RTX 3090 (arch=8.6) Driver version 460.73.01 CUDA_HOME /usr/local/cuda TORCH_CUDA_ARCH_LIST 6.0;6.1;6.2;7.0;7.5 Pillow 8.0.1 torchvision 0.10.0 @/root/conda/lib/python3.7/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20220212 iopath 0.1.9 cv2 4.1.2


PyTorch built with:

[02/22 03:41:40 detectron2]: Command line arguments: Namespace(config_file='configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False) [02/22 03:41:40 detectron2]: Contents of args.config_file=configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml: BASE: Base-YouTubeVIS-VideoInstanceSegmentation.yaml MODEL: WEIGHTS: 186m"186mmodel_final_3c8ec9.pkl186m" META_ARCHITECTURE: 186m"186mVideoMaskFormer186m" SEM_SEG_HEAD: NAME: 186m"186mMaskFormerHead186m" IGNORE_VALUE: 255 NUM_CLASSES: 40 LOSS_WEIGHT: 1.0 CONVS_DIM: 256 MASK_DIM: 256 NORM: 186m"186mGN186m" 242m# pixel decoder PIXEL_DECODER_NAME: 186m"186mMSDeformAttnPixelDecoder186m" IN_FEATURES: [186m"186mres2186m", 186m"186mres3186m", 186m"186mres4186m", 186m"186mres5186m"] DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: [186m"186mres3186m", 186m"186mres4186m", 186m"186mres5186m"] COMMON_STRIDE: 4 TRANSFORMER_ENC_LAYERS: 6 MASK_FORMER: TRANSFORMER_DECODER_NAME: 186m"186mVideoMultiScaleMaskedTransformerDecoder186m" TRANSFORMER_IN_FEATURE: 186m"186mmulti_scale_pixel_decoder186m" DEEP_SUPERVISION: True NO_OBJECT_WEIGHT: 0.1 CLASS_WEIGHT: 2.0 MASK_WEIGHT: 5.0 DICE_WEIGHT: 5.0 HIDDEN_DIM: 256 NUM_OBJECT_QUERIES: 100 NHEADS: 8 DROPOUT: 0.0 DIM_FEEDFORWARD: 2048 ENC_LAYERS: 0 PRE_NORM: False ENFORCE_INPUT_PROJ: False SIZE_DIVISIBILITY: 32 DEC_LAYERS: 10 242m# 9 decoder layers, add one for the loss on learnable query TRAIN_NUM_POINTS: 12544 OVERSAMPLE_RATIO: 3.0 IMPORTANCE_SAMPLE_RATIO: 0.75 TEST: SEMANTIC_ON: False INSTANCE_ON: True PANOPTIC_ON: False OVERLAP_THRESHOLD: 0.8 OBJECT_MASK_THRESHOLD: 0.8

[02/22 03:41:40 detectron2]: Running with full config: CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: false NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:

[02/22 03:41:40 detectron2]: Full config saved to /summary/config.yaml [02/22 03:41:40 d2.utils.env]: Using a generated random seed 40230477

VideoMaskFormer( (backbone): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) (sem_seg_head): MaskFormerHead( (pixel_decoder): MSDeformAttnPixelDecoder( (input_proj): ModuleList( (0): Sequential( (0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) (1): Sequential( (0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) (2): Sequential( (0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) ) (transformer): MSDeformAttnTransformerEncoderOnly( (encoder): MSDeformAttnTransformerEncoder( (layers): ModuleList( (0): MSDeformAttnTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=192, bias=True) (attention_weights): Linear(in_features=256, out_features=96, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.0, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.0, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (1): MSDeformAttnTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=192, bias=True) (attention_weights): Linear(in_features=256, out_features=96, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.0, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.0, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (2): MSDeformAttnTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=192, bias=True) (attention_weights): Linear(in_features=256, out_features=96, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.0, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.0, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (3): MSDeformAttnTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=192, bias=True) (attention_weights): Linear(in_features=256, out_features=96, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.0, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.0, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (4): MSDeformAttnTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=192, bias=True) (attention_weights): Linear(in_features=256, out_features=96, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.0, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.0, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (5): MSDeformAttnTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=192, bias=True) (attention_weights): Linear(in_features=256, out_features=96, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.0, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.0, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) ) ) (pe_layer): Positional encoding PositionEmbeddingSine num_pos_feats: 128 temperature: 10000 normalize: True scale: 6.283185307179586 (mask_features): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (adapter_1): Conv2d( 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): GroupNorm(32, 256, eps=1e-05, affine=True) ) (layer_1): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): GroupNorm(32, 256, eps=1e-05, affine=True) ) ) (predictor): VideoMultiScaleMaskedTransformerDecoder( (pe_layer): PositionEmbeddingSine3D() (transformer_self_attention_layers): ModuleList( (0): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (1): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (2): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (3): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (4): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (5): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (6): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (7): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (8): SelfAttentionLayer( (self_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (transformer_cross_attention_layers): ModuleList( (0): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (1): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (2): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (3): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (4): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (5): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (6): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (7): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) (8): CrossAttentionLayer( (multihead_attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (transformer_ffn_layers): ModuleList( (0): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (1): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (2): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (3): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (4): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (5): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (6): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (7): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (8): FFNLayer( (linear1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.0, inplace=False) (linear2): Linear(in_features=2048, out_features=256, bias=True) (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) (decoder_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (query_feat): Embedding(100, 256) (query_embed): Embedding(100, 256) (level_embed): Embedding(3, 256) (input_proj): ModuleList( (0): Sequential() (1): Sequential() (2): Sequential() ) (class_embed): Linear(in_features=256, out_features=41, bias=True) (mask_embed): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=256, bias=True) ) ) ) ) (criterion): Criterion VideoSetCriterion matcher: Matcher VideoHungarianMatcher cost_class: 2.0 cost_mask: 5.0 cost_dice: 5.0 losses: ['labels', 'masks'] weight_dict: {'loss_ce': 2.0, 'loss_mask': 5.0, 'loss_dice': 5.0, 'loss_ce_0': 2.0, 'loss_mask_0': 5.0, 'loss_dice_0': 5.0, 'loss_ce_1': 2.0, 'loss_mask_1': 5.0, 'loss_dice_1': 5.0, 'loss_ce_2': 2.0, 'loss_mask_2': 5.0, 'loss_dice_2': 5.0, 'loss_ce_3': 2.0, 'loss_mask_3': 5.0, 'loss_dice_3': 5.0, 'loss_ce_4': 2.0, 'loss_mask_4': 5.0, 'loss_dice_4': 5.0, 'loss_ce_5': 2.0, 'loss_mask_5': 5.0, 'loss_dice_5': 5.0, 'loss_ce_6': 2.0, 'loss_mask_6': 5.0, 'loss_dice_6': 5.0, 'loss_ce_7': 2.0, 'loss_mask_7': 5.0, 'loss_dice_7': 5.0, 'loss_ce_8': 2.0, 'loss_mask_8': 5.0, 'loss_dice_8': 5.0} num_classes: 40 eos_coef: 0.1 num_points: 12544 oversample_ratio: 3.0 importance_sample_ratio: 0.75 ) [02/22 03:41:45 mask2former_video.data_video.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(360, 480), max_size=1333, sample_style='choice_by_clip', clip_frame_cnt=2), RandomFlip(clip_frame_cnt=2)] [02/22 03:41:57 mask2former_video.data_video.datasets.ytvis]: Loading /data/bolu.ldz/DATASET/YoutubeVOS2019/train.json takes 12.59 seconds. [02/22 03:41:57 mask2former_video.data_video.datasets.ytvis]: Loaded 2238 videos in YTVIS format from /data/bolu.ldz/DATASET/YoutubeVOS2019/train.json [02/22 03:42:05 mask2former_video.data_video.build]: Using training sampler TrainingSampler [02/22 03:42:19 d2.data.common]: Serializing 2238 elements to byte tensors and concatenating them all ... [02/22 03:42:19 d2.data.common]: Serialized dataset takes 151.32 MiB [02/22 03:42:20 fvcore.common.checkpoint]: [Checkpointer] Loading from /data/bolu.ldz/PRETRAINED_WEIGHTS/mask2former/model_final_3c8ec9.pkl ... [02/22 03:42:22 fvcore.common.checkpoint]: Reading a file from 'MaskFormer Model Zoo' WARNING [02/22 03:42:22 mask2former_video.modeling.transformer_decoder.video_mask2former_transformer_decoder]: Weight format of VideoMultiScaleMaskedTransformerDecoder have changed! Please upgrade your models. Applying automatic conversion now ... WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Skip loading parameter 'sem_seg_head.predictor.class_embed.weight' to the model due to incompatible shapes: (81, 256) in the checkpoint but (41, 256) in the model! You might want to double check if this is expected. WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Skip loading parameter 'sem_seg_head.predictor.class_embed.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (41,) in the model! You might want to double check if this is expected. WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Skip loading parameter 'criterion.empty_weight' to the model due to incompatible shapes: (81,) in the checkpoint but (41,) in the model! You might want to double check if this is expected. WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: criterion.empty_weight sem_seg_head.predictor.class_embed.{bias, weight} [02/22 03:42:22 d2.engine.train_loop]: Starting training from iteration 0 run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET run on: autodrive DETECTRON2_DATASETS: /data/bolu.ldz/DATASET error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device

9p15p commented 2 years ago

It seems that the training is still running, but the error pops up constantly.

9p15p commented 2 years ago

I build the docker_image in my own computer, and push the docker_image to the GPU cluster. It works well on my computer, but raise error on the GPU cluster.

and this is my training shell.

#!/bin/bash

source /root/conda/etc/profile.d/conda.sh
conda activate base
which python

nvcc -V
nvidia-smi
echo $CUDA_HOME
echo $TORCH_CUDA_ARCH_LIST
echo $FORCE_CUDA
python -m detectron2.utils.collect_env

cd mask2former/modeling/pixel_decoder/ops
rm -rf build
rm -rf dist
rm -rf MultiScaleDeformableAttention.egg-info
TORCH_CUDA_ARCH_LIST='6.1;6.2;7.0;7.5;8.0;8.6' FORCE_CUDA=1 python setup.py build install
cd /workspace

#python scripts/train_net_video.py --num-gpus 8 --config-file configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml
#mv /summary/model_final.pth /summary/model_final_2019.pth
#python scripts/train_net_video.py --num-gpus 8 --config-file configs/youtubevis_2021/video_maskformer2_R50_bs16_8ep.yaml MODEL.WEIGHTS /summary/model_final_2019.pth
#mv /summary/model_final.pth /summary/model_final_2021.pth

python scripts/train_net_video.py --num-gpus 2 --config-file configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml SOLVER.MAX_ITER 1
mv /summary/model_final.pth /summary/model_final_2019.pth
python scripts/train_net_video.py --num-gpus 2 --config-file configs/youtubevis_2021/video_maskformer2_R50_bs16_8ep.yaml MODEL.WEIGHTS /summary/model_final_2019.pth SOLVER.MAX_ITER 1
mv /summary/model_final.pth /summary/model_final_2021.pth
bowenc0221 commented 2 years ago

54 seems to find the problem

liuzhihui2046 commented 2 years ago

with the same problem!!! nvidia:gtx 3090 ti cuda:11.1 pytorch version: 1.8, 1.9, 1.10, 1.11。

kunjing96 commented 2 years ago

Has the problem been solved? I have the same problem.

Robotatron commented 1 year ago

What is the status of this? Does anyone have the docker file/image for Mask2Former? @9p15p

Robotatron commented 1 year ago

@9p15p would it be possible for you to share the docker file?

lix19937 commented 2 months ago

ref https://discuss.pytorch.org/t/ms-deformable-im2col-cuda-no-kernel-image-is-available-for-execution/179995/3