cuda runtime error (700) : an illegal memory access was encountered

haoran1062 commented 4 years ago

----------------------  ---------------------------------------------------------
sys.platform            linux
Python                  3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
numpy                   1.17.4
detectron2              0.1.3 @/data/projects/detectron2/detectron2
Compiler                GCC 7.4
CUDA compiler           10.1
detectron2 arch flags   sm_61
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.4.0 @/opt/conda/lib/python3.7/site-packages/torch
PyTorch debug build     False
CUDA available          True
GPU 0,1                 GeForce GTX 1080 Ti
CUDA_HOME               /usr/local/cuda
Pillow                  7.0.0
torchvision             0.5.0 @/opt/conda/lib/python3.7/site-packages/torchvision
torchvision arch flags  sm_35, sm_50, sm_60, sm_70, sm_75
fvcore                  0.1.1.post20200602
cv2                     4.2.0
----------------------  ---------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

[06/09 09:40:20] detectron2 INFO: Command line arguments: Namespace(config_file='configs/BAText/Mytrain/attn_R_50.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=['MODEL.WEIGHTS', 'pretrain_attn_R_50.pth'], resume=False)
[06/09 09:40:20] detectron2 INFO: Contents of args.config_file=configs/BAText/Mytrain/attn_R_50.yaml:
_BASE_: "Base-CTW1500.yaml"
MODEL:
  WEIGHTS: "weights/batext/pretrain_attn_R_50.pth"
  RESNETS:
    DEPTH: 50
  BATEXT:
    RECOGNIZER: "attn" # "attn" "rnn"
SOLVER:
  IMS_PER_BATCH: 2
  BASE_LR: 0.001
  STEPS: (80000,)
  MAX_ITER: 120000
  CHECKPOINT_PERIOD: 10000
TEST:
  EVAL_PERIOD: 10000
OUTPUT_DIR: "output/batext/my/attn_R_50"

[06/09 09:40:20] detectron2 INFO: Running with full config:
CUDNN_BENCHMARK: False
DATALOADER:
  ASPECT_RATIO_GROUPING: True
  FILTER_EMPTY_ANNOTATIONS: True
  NUM_WORKERS: 4
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: ()
  PROPOSAL_FILES_TRAIN: ()
  TEST: ('my_test',)
  TRAIN: ('my_train',)
GLOBAL:
  HACK: 1.0
INPUT:
  CROP:
    CROP_INSTANCE: False
    ENABLED: True
    SIZE: [0.1, 0.1]
    TYPE: relative_range
  FORMAT: BGR
  HFLIP_TRAIN: False
  MASK_FORMAT: polygon
  MAX_SIZE_TEST: 2240
  MAX_SIZE_TRAIN: 1600
  MIN_SIZE_TEST: 1024
  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800, 832, 864, 896)
  MIN_SIZE_TRAIN_SAMPLING: choice
MODEL:
  ANCHOR_GENERATOR:
    ANGLES: [[-90, 0, 90]]
    ASPECT_RATIOS: [[0.5, 1.0, 2.0]]
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES: [[32, 64, 128, 256, 512]]
  BACKBONE:
    ANTI_ALIAS: False
    FREEZE_AT: 2
    NAME: build_fcos_resnet_fpn_backbone
  BASIS_MODULE:
    ANN_SET: coco
    COMMON_STRIDE: 8
    CONVS_DIM: 128
    IN_FEATURES: ['p3', 'p4', 'p5']
    LOSS_ON: False
    LOSS_WEIGHT: 0.3
    NAME: ProtoNet
    NORM: SyncBN
    NUM_BASES: 4
    NUM_CLASSES: 80
    NUM_CONVS: 3
  BATEXT:
    CANONICAL_SIZE: 96
    CONV_DIM: 256
    IN_FEATURES: ['p2', 'p3', 'p4']
    NUM_CHARS: 6900
    NUM_CONV: 2
    POOLER_RESOLUTION: (8, 128)
    POOLER_SCALES: (0.25, 0.125, 0.0625)
    RECOGNITION_LOSS: ctc
    RECOGNIZER: attn
    SAMPLING_RATIO: 1
    VOC_SIZE: 96
  BLENDMASK:
    ATTN_SIZE: 14
    BOTTOM_RESOLUTION: 56
    INSTANCE_LOSS_WEIGHT: 1.0
    POOLER_SAMPLING_RATIO: 1
    POOLER_SCALES: (0.25,)
    POOLER_TYPE: ROIAlignV2
    TOP_INTERP: bilinear
    VISUALIZE: False
  DEVICE: cuda
  DLA:
    CONV_BODY: DLA34
    NORM: FrozenBN
    OUT_FEATURES: ['stage2', 'stage3', 'stage4', 'stage5']
  FCOS:
    CENTER_SAMPLE: True
    FPN_STRIDES: [8, 16, 32, 64, 128]
    INFERENCE_TH_TEST: 0.6
    INFERENCE_TH_TRAIN: 0.05
    IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
    LOC_LOSS_TYPE: giou
    LOSS_ALPHA: 0.25
    LOSS_GAMMA: 2.0
    NMS_TH: 0.5
    NORM: GN
    NUM_BOX_CONVS: 4
    NUM_CLASSES: 1
    NUM_CLS_CONVS: 4
    NUM_SHARE_CONVS: 0
    POST_NMS_TOPK_TEST: 100
    POST_NMS_TOPK_TRAIN: 100
    POS_RADIUS: 1.5
    PRE_NMS_TOPK_TEST: 1000
    PRE_NMS_TOPK_TRAIN: 1000
    PRIOR_PROB: 0.01
    SIZES_OF_INTEREST: [64, 128, 256, 512]
    THRESH_WITH_CTR: False
    TOP_LEVELS: 2
    USE_DEFORMABLE: False
    USE_RELU: True
    USE_SCALE: False
    YIELD_PROPOSAL: False
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES: ['res2', 'res3', 'res4', 'res5']
    NORM: 
    OUT_CHANNELS: 256
  KEYPOINT_ON: False
  LOAD_PROPOSALS: False
  MASK_ON: False
  MEInst:
    AGNOSTIC: True
    CENTER_SAMPLE: True
    DIM_MASK: 60
    FLAG_PARAMETERS: False
    FPN_STRIDES: [8, 16, 32, 64, 128]
    GCN_KERNEL_SIZE: 9
    INFERENCE_TH_TEST: 0.05
    INFERENCE_TH_TRAIN: 0.05
    IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
    IOU_LABELS: [0, 1]
    IOU_THRESHOLDS: [0.5]
    LAST_DEFORMABLE: False
    LOC_LOSS_TYPE: giou
    LOSS_ALPHA: 0.25
    LOSS_GAMMA: 2.0
    LOSS_ON_MASK: False
    MASK_LOSS_TYPE: mse
    MASK_ON: True
    MASK_SIZE: 28
    NMS_TH: 0.6
    NORM: GN
    NUM_BOX_CONVS: 4
    NUM_CLASSES: 80
    NUM_CLS_CONVS: 4
    NUM_MASK_CONVS: 4
    NUM_SHARE_CONVS: 0
    PATH_COMPONENTS: datasets/coco/components/coco_2017_train_class_agnosticTrue_whitenTrue_sigmoidTrue_60.npz
    POST_NMS_TOPK_TEST: 100
    POST_NMS_TOPK_TRAIN: 100
    POS_RADIUS: 1.5
    PRE_NMS_TOPK_TEST: 1000
    PRE_NMS_TOPK_TRAIN: 1000
    PRIOR_PROB: 0.01
    SIGMOID: True
    SIZES_OF_INTEREST: [64, 128, 256, 512]
    THRESH_WITH_CTR: False
    TOP_LEVELS: 2
    TYPE_DEFORMABLE: DCNv1
    USE_DEFORMABLE: False
    USE_GCN_IN_MASK: False
    USE_RELU: True
    USE_SCALE: True
    WHITEN: True
  META_ARCHITECTURE: OneStageRCNN
  MOBILENET: False
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: True
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN: [103.53, 116.28, 123.675]
  PIXEL_STD: [1.0, 1.0, 1.0]
  PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: BAText
  RESNETS:
    DEFORM_INTERVAL: 1
    DEFORM_MODULATED: False
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE: [False, False, False, False]
    DEPTH: 50
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES: ['res2', 'res3', 'res4', 'res5']
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: True
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
    IOU_LABELS: [0, -1, 1]
    IOU_THRESHOLDS: [0.4, 0.5]
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0))
    IOUS: (0.5, 0.6, 0.7)
  ROI_BOX_HEAD:
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    CLS_AGNOSTIC_BBOX_REG: False
    CONV_DIM: 256
    FC_DIM: 1024
    NAME: 
    NORM: 
    NUM_CONV: 0
    NUM_FC: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: False
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES: ['res4']
    IOU_LABELS: [0, 1]
    IOU_THRESHOLDS: [0.5]
    NAME: TextHead
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: True
    SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
    CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512)
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: False
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM: 
    NUM_CONV: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
    BOUNDARY_THRESH: -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES: ['res4']
    IOU_LABELS: [0, -1, 1]
    IOU_THRESHOLDS: [0.3, 0.7]
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 1000
    POST_NMS_TOPK_TRAIN: 2000
    PRE_NMS_TOPK_TEST: 6000
    PRE_NMS_TOPK_TRAIN: 12000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    COMMON_STRIDE: 4
    CONVS_DIM: 128
    IGNORE_VALUE: 255
    IN_FEATURES: ['p2', 'p3', 'p4', 'p5']
    LOSS_WEIGHT: 1.0
    NAME: SemSegFPNHead
    NORM: GN
    NUM_CLASSES: 54
  TOP_MODULE:
    DIM: 16
    NAME: conv
  VOVNET:
    BACKBONE_OUT_CHANNELS: 256
    CONV_BODY: V-39-eSE
    NORM: FrozenBN
    OUT_CHANNELS: 256
    OUT_FEATURES: ['stage2', 'stage3', 'stage4', 'stage5']
  WEIGHTS: pretrain_attn_R_50.pth
OUTPUT_DIR: output/batext/my/attn_R_50
SEED: -1
SOLVER:
  BASE_LR: 0.001
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 10000
  CLIP_GRADIENTS:
    CLIP_TYPE: value
    CLIP_VALUE: 1.0
    ENABLED: True
    NORM_TYPE: 2.0
  GAMMA: 0.1
  IMS_PER_BATCH: 2
  LR_SCHEDULER_NAME: WarmupMultiStepLR
  MAX_ITER: 120000
  MOMENTUM: 0.9
  NESTEROV: False
  STEPS: (80000,)
  WARMUP_FACTOR: 0.001
  WARMUP_ITERS: 1000
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0.0001
  WEIGHT_DECAY_NORM: 0.0
TEST:
  AUG:
    ENABLED: False
    FLIP: True
    MAX_SIZE: 4000
    MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200)
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 10000
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
    ENABLED: False
    NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
[06/09 09:40:20] detectron2 INFO: Full config saved to output/batext/my/attn_R_50/config.yaml
[06/09 09:40:20] d2.utils.env INFO: Using a generated random seed 20212920
[06/09 09:40:20] d2.engine.defaults INFO: Model:
OneStageRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelP6P7(
      (p6): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    )
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv1): Conv2d(
            64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
      )
      (res3): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv1): Conv2d(
            256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
      )
      (res4): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
          (conv1): Conv2d(
            512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (4): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (5): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
      )
      (res5): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
          (conv1): Conv2d(
            1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
      )
    )
  )
  (proposal_generator): BAText(
    (iou_loss): IOULoss()
    (fcos_head): FCOSHead(
      (cls_tower): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): GroupNorm(32, 256, eps=1e-05, affine=True)
        (2): ReLU()
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): GroupNorm(32, 256, eps=1e-05, affine=True)
        (5): ReLU()
        (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (7): GroupNorm(32, 256, eps=1e-05, affine=True)
        (8): ReLU()
        (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (10): GroupNorm(32, 256, eps=1e-05, affine=True)
        (11): ReLU()
      )
      (bbox_tower): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): GroupNorm(32, 256, eps=1e-05, affine=True)
        (2): ReLU()
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): GroupNorm(32, 256, eps=1e-05, affine=True)
        (5): ReLU()
        (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (7): GroupNorm(32, 256, eps=1e-05, affine=True)
        (8): ReLU()
        (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (10): GroupNorm(32, 256, eps=1e-05, affine=True)
        (11): ReLU()
      )
      (share_tower): Sequential()
      (cls_logits): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bbox_pred): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (ctrness): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    )
  )
  (roi_heads): TextHead(
    (pooler): TopPooler(
      (level_poolers): ModuleList(
        (0): BezierAlign(output_size=(8, 128), spatial_scale=0.25, sampling_ratio=1, aligned=True)
        (1): BezierAlign(output_size=(8, 128), spatial_scale=0.125, sampling_ratio=1, aligned=True)
        (2): BezierAlign(output_size=(8, 128), spatial_scale=0.0625, sampling_ratio=1, aligned=True)
      )
    )
    (tower): Sequential(
      (0): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (recognizer): ATTPredictor(
      (CRNN): CRNN(
        (convs): Sequential(
          (0): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 1), padding=(1, 1), bias=False)
            (1): GroupNorm(32, 256, eps=1e-05, affine=True)
            (2): ReLU(inplace=True)
          )
          (1): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 1), padding=(1, 1), bias=False)
            (1): GroupNorm(32, 256, eps=1e-05, affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (rnn): BidirectionalLSTM(
          (rnn): LSTM(256, 256, bidirectional=True)
          (embedding): Linear(in_features=512, out_features=256, bias=True)
        )
      )
      (criterion): NLLLoss()
      (attention): Attention(
        (embedding): Embedding(97, 256)
        (attn_combine): Linear(in_features=512, out_features=256, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (gru): GRU(256, 256)
        (out): Linear(in_features=256, out_features=97, bias=True)
        (vat): Linear(in_features=256, out_features=1, bias=True)
      )
    )
  )
  (top_module): Conv2d(256, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
[06/10 07:21:02 d2.data.dataset_mapper]: CropGen used in training: RandomCrop(crop_type='relative_range', crop_size=[0.1, 0.1])
[06/10 07:21:02 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800, 832, 864, 896), max_size=1600, sample_style='choice'), RandomFlip()]
[06/10 07:21:02 adet.data.dataset_mapper]: Rebuilding the transform generators. The previous generators will be overridden.
[06/10 07:21:02 adet.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800, 832, 864, 896), max_size=1600, sample_style='choice')]
[06/10 07:21:04 adet.data.datasets.text]: Loading datasets/my/annotations/train.json takes 1.61 seconds.
[06/10 07:21:04 adet.data.datasets.text]: Loaded 997 images in COCO format from datasets/my/annotations/train.json
[06/10 07:21:04 d2.data.build]: Removed 0 images with no usable annotations. 997 images left.
[06/10 07:21:04 d2.data.build]: Distribution of instances among all 1 categories:
|  category  | #instances   |
|:----------:|:-------------|
|    text    | 58647        |
|            |              |
[06/10 07:21:04 d2.data.common]: Serializing 997 elements to byte tensors and concatenating them all ...
[06/10 07:21:05 d2.data.common]: Serialized dataset takes 28.93 MiB
[06/10 07:21:05 d2.data.build]: Using training sampler TrainingSampler
[06/10 07:21:06 fvcore.common.checkpoint]: Loading checkpoint from pretrain_attn_R_50.pth
[06/10 07:21:06 adet.trainer]: Starting training from iteration 0
/opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
/opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
/opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
/opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
/opt/conda/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:590: UserWarning: Metadata Warning, tag 282 had too many entries: 2, expected 1
  % (tag, len(values))
/opt/conda/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:590: UserWarning: Metadata Warning, tag 283 had too many entries: 2, expected 1
  % (tag, len(values))
transposing image datasets/my/images/1231.jpg
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
cuda runtime error (700) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp:278
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered (insert_events at /opt/conda/conda-bld/pytorch_1587428398394/work/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f00c0282b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x6d0 (0x7f00c003de30 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f00c02706ed in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x51ee0a (0x7f00ed29de0a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x1a26bf (0x564f2bb646bf in /opt/conda/bin/python)
frame #5: <unknown function> + 0x10e39b (0x564f2bad039b in /opt/conda/bin/python)
frame #6: <unknown function> + 0x10dfb2 (0x564f2bacffb2 in /opt/conda/bin/python)
frame #7: <unknown function> + 0x10e4f7 (0x564f2bad04f7 in /opt/conda/bin/python)
frame #8: <unknown function> + 0x10e4f7 (0x564f2bad04f7 in /opt/conda/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x48c (0x564f2bad741c in /opt/conda/bin/python)
frame #10: _PyFunction_FastCallDict + 0x1d5 (0x564f2bad83d5 in /opt/conda/bin/python)
frame #11: _PyObject_Call_Prepend + 0x63 (0x564f2baf6c93 in /opt/conda/bin/python)
frame #12: PyObject_Call + 0x6e (0x564f2bae9a2e in /opt/conda/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x1e94 (0x564f2bb8fce4 in /opt/conda/bin/python)
frame #14: _PyEval_EvalCodeWithName + 0x2f9 (0x564f2bad7289 in /opt/conda/bin/python)
frame #15: _PyFunction_FastCallDict + 0x1d5 (0x564f2bad83d5 in /opt/conda/bin/python)
frame #16: _PyObject_Call_Prepend + 0x63 (0x564f2baf6c93 in /opt/conda/bin/python)
frame #17: <unknown function> + 0x16bd1a (0x564f2bb2dd1a in /opt/conda/bin/python)
frame #18: _PyObject_FastCallKeywords + 0x48b (0x564f2bb2ebcb in /opt/conda/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x52e6 (0x564f2bb93136 in /opt/conda/bin/python)
frame #20: _PyFunction_FastCallKeywords + 0xfb (0x564f2bb26d5b in /opt/conda/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x690 (0x564f2bb8e4e0 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallKeywords + 0xfb (0x564f2bb26d5b in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x690 (0x564f2bb8e4e0 in /opt/conda/bin/python)
frame #24: _PyFunction_FastCallKeywords + 0xfb (0x564f2bb26d5b in /opt/conda/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x690 (0x564f2bb8e4e0 in /opt/conda/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0xab8 (0x564f2bad7a48 in /opt/conda/bin/python)
frame #27: _PyFunction_FastCallDict + 0x1d5 (0x564f2bad83d5 in /opt/conda/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1e94 (0x564f2bb8fce4 in /opt/conda/bin/python)
frame #29: _PyFunction_FastCallDict + 0x10b (0x564f2bad830b in /opt/conda/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1e94 (0x564f2bb8fce4 in /opt/conda/bin/python)
frame #31: _PyFunction_FastCallDict + 0x10b (0x564f2bad830b in /opt/conda/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x1e94 (0x564f2bb8fce4 in /opt/conda/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0xfb (0x564f2bb26d5b in /opt/conda/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x690 (0x564f2bb8e4e0 in /opt/conda/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x564f2bb26d5b in /opt/conda/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x690 (0x564f2bb8e4e0 in /opt/conda/bin/python)
frame #37: _PyFunction_FastCallKeywords + 0xfb (0x564f2bb26d5b in /opt/conda/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x416 (0x564f2bb8e266 in /opt/conda/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x2f9 (0x564f2bad7289 in /opt/conda/bin/python)
frame #40: _PyFunction_FastCallKeywords + 0x387 (0x564f2bb26fe7 in /opt/conda/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x14d4 (0x564f2bb8f324 in /opt/conda/bin/python)
frame #42: _PyEval_EvalCodeWithName + 0x2f9 (0x564f2bad7289 in /opt/conda/bin/python)
frame #43: PyEval_EvalCodeEx + 0x44 (0x564f2bad81c4 in /opt/conda/bin/python)
frame #44: PyEval_EvalCode + 0x1c (0x564f2bad81ec in /opt/conda/bin/python)
frame #45: <unknown function> + 0x22ccb4 (0x564f2bbeecb4 in /opt/conda/bin/python)
frame #46: PyRun_StringFlags + 0x7d (0x564f2bbfa03d in /opt/conda/bin/python)
frame #47: PyRun_SimpleStringFlags + 0x3f (0x564f2bbfa09f in /opt/conda/bin/python)
frame #48: <unknown function> + 0x23819d (0x564f2bbfa19d in /opt/conda/bin/python)
frame #49: _Py_UnixMain + 0x3c (0x564f2bbfa51c in /opt/conda/bin/python)
frame #50: __libc_start_main + 0xe7 (0x7f0107f73b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #51: <unknown function> + 0x1dbac0 (0x564f2bb9dac0 in /opt/conda/bin/python)

cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered (insert_events at /opt/conda/conda-bld/pytorch_1587428398394/work/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7fc28d0a4b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x6d0 (0x7fc28ce5fe30 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fc28d0926ed in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x51ee0a (0x7fc2ba0bfe0a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x1a26bf (0x55c009ea06bf in /opt/conda/bin/python)
frame #5: <unknown function> + 0x10e39b (0x55c009e0c39b in /opt/conda/bin/python)
frame #6: <unknown function> + 0x10dfb2 (0x55c009e0bfb2 in /opt/conda/bin/python)
frame #7: <unknown function> + 0x10e4f7 (0x55c009e0c4f7 in /opt/conda/bin/python)
frame #8: <unknown function> + 0x10e4f7 (0x55c009e0c4f7 in /opt/conda/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x48c (0x55c009e1341c in /opt/conda/bin/python)
frame #10: _PyFunction_FastCallDict + 0x1d5 (0x55c009e143d5 in /opt/conda/bin/python)
frame #11: _PyObject_Call_Prepend + 0x63 (0x55c009e32c93 in /opt/conda/bin/python)
frame #12: PyObject_Call + 0x6e (0x55c009e25a2e in /opt/conda/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x1e94 (0x55c009ecbce4 in /opt/conda/bin/python)
frame #14: _PyEval_EvalCodeWithName + 0x2f9 (0x55c009e13289 in /opt/conda/bin/python)
frame #15: _PyFunction_FastCallDict + 0x1d5 (0x55c009e143d5 in /opt/conda/bin/python)
frame #16: _PyObject_Call_Prepend + 0x63 (0x55c009e32c93 in /opt/conda/bin/python)
frame #17: <unknown function> + 0x16bd1a (0x55c009e69d1a in /opt/conda/bin/python)
frame #18: _PyObject_FastCallKeywords + 0x48b (0x55c009e6abcb in /opt/conda/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x52e6 (0x55c009ecf136 in /opt/conda/bin/python)
frame #20: _PyFunction_FastCallKeywords + 0xfb (0x55c009e62d5b in /opt/conda/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x690 (0x55c009eca4e0 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallKeywords + 0xfb (0x55c009e62d5b in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x690 (0x55c009eca4e0 in /opt/conda/bin/python)
frame #24: _PyFunction_FastCallKeywords + 0xfb (0x55c009e62d5b in /opt/conda/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x690 (0x55c009eca4e0 in /opt/conda/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0xab8 (0x55c009e13a48 in /opt/conda/bin/python)
frame #27: _PyFunction_FastCallDict + 0x1d5 (0x55c009e143d5 in /opt/conda/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1e94 (0x55c009ecbce4 in /opt/conda/bin/python)
frame #29: _PyFunction_FastCallDict + 0x10b (0x55c009e1430b in /opt/conda/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1e94 (0x55c009ecbce4 in /opt/conda/bin/python)
frame #31: _PyFunction_FastCallDict + 0x10b (0x55c009e1430b in /opt/conda/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x1e94 (0x55c009ecbce4 in /opt/conda/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0xfb (0x55c009e62d5b in /opt/conda/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x690 (0x55c009eca4e0 in /opt/conda/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x55c009e62d5b in /opt/conda/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x690 (0x55c009eca4e0 in /opt/conda/bin/python)
frame #37: _PyFunction_FastCallKeywords + 0xfb (0x55c009e62d5b in /opt/conda/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x416 (0x55c009eca266 in /opt/conda/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x2f9 (0x55c009e13289 in /opt/conda/bin/python)
frame #40: _PyFunction_FastCallKeywords + 0x387 (0x55c009e62fe7 in /opt/conda/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x14d4 (0x55c009ecb324 in /opt/conda/bin/python)
frame #42: _PyEval_EvalCodeWithName + 0x2f9 (0x55c009e13289 in /opt/conda/bin/python)
frame #43: PyEval_EvalCodeEx + 0x44 (0x55c009e141c4 in /opt/conda/bin/python)
frame #44: PyEval_EvalCode + 0x1c (0x55c009e141ec in /opt/conda/bin/python)
frame #45: <unknown function> + 0x22ccb4 (0x55c009f2acb4 in /opt/conda/bin/python)
frame #46: PyRun_StringFlags + 0x7d (0x55c009f3603d in /opt/conda/bin/python)
frame #47: PyRun_SimpleStringFlags + 0x3f (0x55c009f3609f in /opt/conda/bin/python)
frame #48: <unknown function> + 0x23819d (0x55c009f3619d in /opt/conda/bin/python)
frame #49: _Py_UnixMain + 0x3c (0x55c009f3651c in /opt/conda/bin/python)
frame #50: __libc_start_main + 0xe7 (0x7fc2d4d95b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #51: <unknown function> + 0x1dbac0 (0x55c009ed9ac0 in /opt/conda/bin/python)

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
cuda runtime error (700) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp:278
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered (insert_events at /opt/conda/conda-bld/pytorch_1587428398394/work/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f2aa15b1b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x6d0 (0x7f2aa136ce30 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f2aa159f6ed in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x51ee0a (0x7f2ace5cce0a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x1a26bf (0x55db483786bf in /opt/conda/bin/python)
frame #5: <unknown function> + 0x10e39b (0x55db482e439b in /opt/conda/bin/python)
frame #6: <unknown function> + 0x10dfb2 (0x55db482e3fb2 in /opt/conda/bin/python)
frame #7: <unknown function> + 0x10e4f7 (0x55db482e44f7 in /opt/conda/bin/python)
frame #8: <unknown function> + 0x10e4f7 (0x55db482e44f7 in /opt/conda/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x48c (0x55db482eb41c in /opt/conda/bin/python)
frame #10: _PyFunction_FastCallDict + 0x1d5 (0x55db482ec3d5 in /opt/conda/bin/python)
frame #11: _PyObject_Call_Prepend + 0x63 (0x55db4830ac93 in /opt/conda/bin/python)
frame #12: PyObject_Call + 0x6e (0x55db482fda2e in /opt/conda/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x1e94 (0x55db483a3ce4 in /opt/conda/bin/python)
frame #14: _PyEval_EvalCodeWithName + 0x2f9 (0x55db482eb289 in /opt/conda/bin/python)
frame #15: _PyFunction_FastCallDict + 0x1d5 (0x55db482ec3d5 in /opt/conda/bin/python)
frame #16: _PyObject_Call_Prepend + 0x63 (0x55db4830ac93 in /opt/conda/bin/python)
frame #17: <unknown function> + 0x16bd1a (0x55db48341d1a in /opt/conda/bin/python)
frame #18: _PyObject_FastCallKeywords + 0x48b (0x55db48342bcb in /opt/conda/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x52e6 (0x55db483a7136 in /opt/conda/bin/python)
frame #20: _PyFunction_FastCallKeywords + 0xfb (0x55db4833ad5b in /opt/conda/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x690 (0x55db483a24e0 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallKeywords + 0xfb (0x55db4833ad5b in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x690 (0x55db483a24e0 in /opt/conda/bin/python)
frame #24: _PyFunction_FastCallKeywords + 0xfb (0x55db4833ad5b in /opt/conda/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x690 (0x55db483a24e0 in /opt/conda/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0xab8 (0x55db482eba48 in /opt/conda/bin/python)
frame #27: _PyFunction_FastCallDict + 0x1d5 (0x55db482ec3d5 in /opt/conda/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1e94 (0x55db483a3ce4 in /opt/conda/bin/python)
frame #29: _PyFunction_FastCallDict + 0x10b (0x55db482ec30b in /opt/conda/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1e94 (0x55db483a3ce4 in /opt/conda/bin/python)
frame #31: _PyFunction_FastCallDict + 0x10b (0x55db482ec30b in /opt/conda/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x1e94 (0x55db483a3ce4 in /opt/conda/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0xfb (0x55db4833ad5b in /opt/conda/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x690 (0x55db483a24e0 in /opt/conda/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x55db4833ad5b in /opt/conda/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x690 (0x55db483a24e0 in /opt/conda/bin/python)
frame #37: _PyFunction_FastCallKeywords + 0xfb (0x55db4833ad5b in /opt/conda/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x416 (0x55db483a2266 in /opt/conda/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x2f9 (0x55db482eb289 in /opt/conda/bin/python)
frame #40: _PyFunction_FastCallKeywords + 0x387 (0x55db4833afe7 in /opt/conda/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x14d4 (0x55db483a3324 in /opt/conda/bin/python)
frame #42: _PyEval_EvalCodeWithName + 0x2f9 (0x55db482eb289 in /opt/conda/bin/python)
frame #43: PyEval_EvalCodeEx + 0x44 (0x55db482ec1c4 in /opt/conda/bin/python)
frame #44: PyEval_EvalCode + 0x1c (0x55db482ec1ec in /opt/conda/bin/python)
frame #45: <unknown function> + 0x22ccb4 (0x55db48402cb4 in /opt/conda/bin/python)
frame #46: PyRun_StringFlags + 0x7d (0x55db4840e03d in /opt/conda/bin/python)
frame #47: PyRun_SimpleStringFlags + 0x3f (0x55db4840e09f in /opt/conda/bin/python)
frame #48: <unknown function> + 0x23819d (0x55db4840e19d in /opt/conda/bin/python)
frame #49: _Py_UnixMain + 0x3c (0x55db4840e51c in /opt/conda/bin/python)
frame #50: __libc_start_main + 0xe7 (0x7f2ae92a2b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #51: <unknown function> + 0x1dbac0 (0x55db483b1ac0 in /opt/conda/bin/python)

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
cuda runtime error (700) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp:278
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered (insert_events at /opt/conda/conda-bld/pytorch_1587428398394/work/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f67236e8b5e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x6d0 (0x7f67234a3e30 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f67236d66ed in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x51ee0a (0x7f6750703e0a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x1a26bf (0x5607355456bf in /opt/conda/bin/python)
frame #5: <unknown function> + 0x10e39b (0x5607354b139b in /opt/conda/bin/python)
frame #6: <unknown function> + 0x10dfb2 (0x5607354b0fb2 in /opt/conda/bin/python)
frame #7: <unknown function> + 0x10e4f7 (0x5607354b14f7 in /opt/conda/bin/python)
frame #8: <unknown function> + 0x10e4f7 (0x5607354b14f7 in /opt/conda/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x48c (0x5607354b841c in /opt/conda/bin/python)
frame #10: _PyFunction_FastCallDict + 0x1d5 (0x5607354b93d5 in /opt/conda/bin/python)
frame #11: _PyObject_Call_Prepend + 0x63 (0x5607354d7c93 in /opt/conda/bin/python)
frame #12: PyObject_Call + 0x6e (0x5607354caa2e in /opt/conda/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x1e94 (0x560735570ce4 in /opt/conda/bin/python)
frame #14: _PyEval_EvalCodeWithName + 0x2f9 (0x5607354b8289 in /opt/conda/bin/python)
frame #15: _PyFunction_FastCallDict + 0x1d5 (0x5607354b93d5 in /opt/conda/bin/python)
frame #16: _PyObject_Call_Prepend + 0x63 (0x5607354d7c93 in /opt/conda/bin/python)
frame #17: <unknown function> + 0x16bd1a (0x56073550ed1a in /opt/conda/bin/python)
frame #18: _PyObject_FastCallKeywords + 0x48b (0x56073550fbcb in /opt/conda/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x52e6 (0x560735574136 in /opt/conda/bin/python)
frame #20: _PyFunction_FastCallKeywords + 0xfb (0x560735507d5b in /opt/conda/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x690 (0x56073556f4e0 in /opt/conda/bin/python)
frame #22: _PyFunction_FastCallKeywords + 0xfb (0x560735507d5b in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x690 (0x56073556f4e0 in /opt/conda/bin/python)
frame #24: _PyFunction_FastCallKeywords + 0xfb (0x560735507d5b in /opt/conda/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x690 (0x56073556f4e0 in /opt/conda/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0xab8 (0x5607354b8a48 in /opt/conda/bin/python)
frame #27: _PyFunction_FastCallDict + 0x1d5 (0x5607354b93d5 in /opt/conda/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1e94 (0x560735570ce4 in /opt/conda/bin/python)
frame #29: _PyFunction_FastCallDict + 0x10b (0x5607354b930b in /opt/conda/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1e94 (0x560735570ce4 in /opt/conda/bin/python)
frame #31: _PyFunction_FastCallDict + 0x10b (0x5607354b930b in /opt/conda/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x1e94 (0x560735570ce4 in /opt/conda/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0xfb (0x560735507d5b in /opt/conda/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x690 (0x56073556f4e0 in /opt/conda/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x560735507d5b in /opt/conda/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x690 (0x56073556f4e0 in /opt/conda/bin/python)
frame #37: _PyFunction_FastCallKeywords + 0xfb (0x560735507d5b in /opt/conda/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x416 (0x56073556f266 in /opt/conda/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x2f9 (0x5607354b8289 in /opt/conda/bin/python)
frame #40: _PyFunction_FastCallKeywords + 0x387 (0x560735507fe7 in /opt/conda/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x14d4 (0x560735570324 in /opt/conda/bin/python)
frame #42: _PyEval_EvalCodeWithName + 0x2f9 (0x5607354b8289 in /opt/conda/bin/python)
frame #43: PyEval_EvalCodeEx + 0x44 (0x5607354b91c4 in /opt/conda/bin/python)
frame #44: PyEval_EvalCode + 0x1c (0x5607354b91ec in /opt/conda/bin/python)
frame #45: <unknown function> + 0x22ccb4 (0x5607355cfcb4 in /opt/conda/bin/python)
frame #46: PyRun_StringFlags + 0x7d (0x5607355db03d in /opt/conda/bin/python)
frame #47: PyRun_SimpleStringFlags + 0x3f (0x5607355db09f in /opt/conda/bin/python)
frame #48: <unknown function> + 0x23819d (0x5607355db19d in /opt/conda/bin/python)
frame #49: _Py_UnixMain + 0x3c (0x5607355db51c in /opt/conda/bin/python)
frame #50: __libc_start_main + 0xe7 (0x7f676b3d9b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #51: <unknown function> + 0x1dbac0 (0x56073557eac0 in /opt/conda/bin/python)

^CTraceback (most recent call last):
  File "tools/train_net.py", line 243, in <module>
    args=(args,),
  File "/data/projects/detectron2/detectron2/engine/launch.py", line 54, in launch
    daemon=False,
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 78, in join
    timeout=timeout,
  File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 920, in wait
    ready = selector.select(timeout)
  File "/opt/conda/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
  len(cache))
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
  len(cache))
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
  len(cache))
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
  len(cache))
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

I don't know how to solve this bug, what happened...

Yuliang-Liu commented 4 years ago

@haoran1062 It seems you have modified the config file. Can you run the ABCNet demo successfully (without changing anything)?

haoran1062 commented 4 years ago

@haoran1062 It seems you have modified the config file. Can you run the ABCNet demo successfully (without changing anything)?

I run the CTW1500 train, and it worked. But when I train my own datasets, it always occurred the CUDA error. I only modify the config like that:

_BASE_: "../Base-BAText.yaml"
MODEL:
  BATEXT:
    POOLER_RESOLUTION: (8,128)
    NUM_CHARS: 6900
  FCOS:
    INFERENCE_TH_TEST: 0.6
DATASETS:
  TRAIN: ("my_train",)
  TEST: ("my_test",)
INPUT:
  MIN_SIZE_TEST: 1024
  MAX_SIZE_TEST: 2240

and I outputs my data, looks all right...

[{'file_name': 'datasets/my/images/1104.jpg', 'height': 4029, 'width': 3021, 'image_id': 1104, 'image': tensor([[[184., 184., 185.,  ..., 204., 201., 200.],
         [182., 184., 186.,  ..., 204., 200., 200.],
         [185., 185., 185.,  ..., 202., 202., 202.],
         ...,
         [182., 183., 185.,  ..., 188., 189., 189.],
         [181., 183., 186.,  ..., 188., 189., 190.],
         [179., 182., 184.,  ..., 190., 190., 190.]],

        [[191., 191., 190.,  ..., 219., 217., 217.],
         [189., 191., 191.,  ..., 219., 217., 217.],
         [190., 191., 190.,  ..., 217., 219., 219.],
         ...,
         [181., 181., 181.,  ..., 197., 200., 200.],
         [180., 180., 182.,  ..., 197., 199., 200.],
         [178., 181., 181.,  ..., 199., 199., 199.]],

        [[200., 200., 199.,  ..., 228., 226., 226.],
         [198., 200., 200.,  ..., 228., 226., 226.],
         [199., 200., 199.,  ..., 226., 228., 228.],
         ...,
         [183., 181., 183.,  ..., 206., 208., 208.],
         [182., 182., 184.,  ..., 206., 207., 208.],
         [180., 183., 183.,  ..., 209., 208., 208.]]]), 'instances': Instances(num_instances=47, image_height=704, image_width=967, fields=[gt_boxes: Boxes(tensor([[173.7682, 364.5951, 249.3772, 384.8136],
        [ 25.2030, 452.0979, 321.3385, 476.2938],
        [162.1615,  97.1149, 258.9942, 118.9906],
        [169.4571, 234.9981, 458.9602, 259.5254],
        [ 27.5243, 494.8550, 286.1869, 519.0508],
        [ 79.5885,   0.0000, 354.1687,  34.1394],
        [ 16.5809, 298.9680,  97.4959, 323.1638],
        [ 26.5295, 472.3164, 263.6368, 497.5066],
        [153.5394, 298.9680, 214.2256, 319.1864],
        [ 17.5758, 320.8437, 109.1025, 345.3710],
        [164.8145, 130.9228, 316.6958, 154.1243],
        [540.8701, 362.9379, 646.6564, 385.1450],
        [ 25.2030, 408.3465, 179.7373, 432.5424],
        [809.1495, 357.6346, 948.0977, 381.1676],
        [ 10.6118, 234.0038, 134.9688, 261.8456],
        [165.1461, 164.7307, 318.3539, 188.5951],
        [ 23.8765, 430.8851, 286.1869, 455.0810],
        [ 76.9355,  27.8418, 349.8577,  50.0490],
        [698.0573, 490.2147, 749.7898, 507.7816],
        [440.0580, 408.6780, 521.9678, 426.2448],
        [678.1602, 361.9435, 766.3707, 384.1507],
        [441.3844, 429.5593, 521.9678, 447.4576],
        [ 11.2750, 200.8588, 136.6269, 225.3861],
        [  9.6169, 129.5970, 132.3158, 154.7872],
        [413.8601, 363.6007, 521.9678, 385.8079],
        [839.9901, 487.2316, 927.8690, 506.4557],
        [  9.2853,  94.7947, 129.6629, 121.9736],
        [ 12.2699, 163.0734, 131.9842, 189.5894],
        [458.9602, 493.5292, 524.2891, 512.7533],
        [441.3844, 450.7721, 522.9626, 469.9962],
        [446.3587, 472.3164, 524.2891, 490.2147],
        [582.6540, 428.5650, 607.1938, 447.4576],
        [151.8813, 320.8437, 220.1948, 342.3879],
        [400.2637,  43.7514, 598.5717,  83.8569],
        [162.8248, 197.8757, 259.9890, 224.3917],
        [582.9856, 408.3465, 605.8673, 424.9190],
        [584.3121, 448.4520, 605.8673, 467.6761],
        [585.3069, 469.6648, 607.1938, 490.2147],
        [586.6334, 493.5292, 609.5151, 510.1017],
        [689.4352, 407.0207, 752.4427, 424.9190],
        [692.0881, 427.2392, 753.7692, 445.1375],
        [694.4095, 447.1262, 749.7898, 466.3503],
        [695.7360, 469.6648, 749.7898, 486.2373],
[859.8871, 400.7232, 897.6917, 419.9473],
        [862.5401, 424.5876, 902.6660, 442.4859],
        [862.5401, 444.8060, 900.3447, 462.3729],
        [862.5401, 464.6930, 900.3447, 484.9115]])), gt_classes: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), beziers: tensor([[173.7682, 364.5951, 198.8618, 364.5951, 223.9520, 364.5951, 249.0456,
         364.5951, 249.0456, 384.4821, 223.9520, 384.4821, 198.8618, 384.4821,
         173.7682, 384.4821],
        [ 25.2030, 454.7495, 122.9111, 453.2381, 220.6392, 453.1055, 318.3539,
         452.0979, 321.0069, 472.3164, 222.4034, 473.4135, 123.7999, 474.3515,
          25.2030, 475.9623],
        [164.1512,  97.4463, 194.6535,  96.7105, 225.1691,  97.3502, 255.6780,
          97.1149, 258.6625, 118.6591, 226.4955, 118.6591, 194.3285, 118.6591,
         162.1615, 118.6591],
        [169.4571, 234.9981, 265.8487, 234.9981, 362.2370, 234.9981, 458.6286,
         234.9981, 458.6286, 259.1940, 362.2370, 259.1940, 265.8487, 259.1940,
         169.4571, 259.1940],
        [ 27.5243, 497.1751, 113.6358, 496.3995, 199.7439, 495.6273, 285.8553,
         494.8550, 285.8553, 514.7420, 200.6260, 515.8590, 115.4000, 517.0853,
          30.1773, 518.7194],
        [ 79.5885,   0.0000, 171.0058,   1.6539, 262.4297,   2.8372, 353.8371,
           4.9718, 350.8525,  33.8079, 260.5528,  31.3187, 170.2298,  29.8570,
          79.9201,  27.8418],
        [ 16.5809, 300.6252,  42.9944, 299.6441,  69.4144, 299.4619,  95.8378,
         298.9680,  97.1643, 321.8380,  70.2965, 321.6557,  43.4454, 322.5275,
          16.5809, 322.8324],
        [ 26.5295, 477.2881, 104.5560, 474.9945, 182.6158, 474.0996, 260.6523,
         472.3164, 263.3052, 492.2034, 185.1526, 493.8573, 106.9901, 495.0340,
          28.8508, 497.1751],
        [153.5394, 298.9680, 173.6587, 298.9680, 193.7747, 298.9680, 213.8940,
         298.9680, 213.8940, 318.8550, 193.7747, 318.8550, 173.6587, 318.8550,
         153.5394, 318.8550],
        [ 17.5758, 320.8437,  47.5375, 320.6614,  77.4860, 321.5431, 107.4444,
         321.8380, 108.7709, 342.0565,  78.9087, 342.6100,  49.0796, 343.9922,
          19.2339, 345.0396],
        [164.8145, 130.9228, 215.3299, 130.9228, 265.8487, 130.9228, 316.3642,
         130.9228, 316.3642, 153.7928, 265.8487, 153.7928, 215.3299, 153.7928,
         164.8145, 153.7928],
        [540.8701, 362.9379, 576.0216, 362.9379, 611.1732, 362.9379, 646.3248,
         362.9379, 646.3248, 384.8136, 611.1732, 384.8136, 576.0216, 384.8136,
         540.8701, 384.8136],
        [ 25.2030, 411.9925,  76.1496, 410.2523, 127.1194, 409.5928, 178.0792,
         408.3465, 179.4057, 429.5593, 128.0048, 430.4410,  76.5940, 430.8354,
          25.2030, 432.2109],
        [809.1495, 357.6346, 855.3539, 357.6346, 901.5617, 357.6346, 947.7661,
         357.6346, 947.7661, 380.1732, 902.5532, 379.9711, 857.3470, 380.5577,
         812.1341, 380.8362],
        [ 12.9331, 234.9981,  53.0590, 234.5440,  93.1815, 233.9474, 133.3107,
         234.0038, 134.6372, 258.5311,  93.2843, 259.1012,  51.9547, 260.4535,
          10.6118, 261.5141],
        [165.1461, 164.7307, 216.1059, 164.7307, 267.0624, 164.7307, 318.0223,
         164.7307, 318.0223, 188.2637, 267.0624, 188.2637, 216.1059, 188.2637,
         165.1461, 188.2637],
        [ 23.8765, 433.5367, 111.1951, 432.0220, 198.5301, 431.8961, 285.8553,
         430.8851, 285.8553, 449.7778, 198.5301, 451.4317, 111.1917, 452.5984,
          23.8765, 454.7495],
        [ 78.5936,  27.8418, 168.9033,  29.3930, 259.2164,  30.9409, 349.5261,
          32.4821, 349.5261,  49.7175, 258.6824,  46.6682, 167.7957,  44.9877,
          76.9355,  42.4256],
        [698.0573, 490.2147, 715.1920, 490.2147, 732.3234, 490.2147, 749.4582,
         490.2147, 749.4582, 506.7872, 732.7579, 506.5784, 716.0807, 507.1717,
         699.3837, 507.4501],
        [440.0580, 409.6723, 466.4747, 409.2216, 492.8914, 408.6150, 519.3148,
         408.6780, 521.6362, 425.9134, 494.8845, 425.9134, 468.1361, 425.9134,
         441.3844, 425.9134],
        [679.4866, 361.9435, 708.3375, 361.9435, 737.1883, 361.9435, 766.0391,
         361.9435, 766.0391, 383.8192, 736.7472, 383.3983, 707.4587, 383.0701,
         678.1602, 383.1563],
        [441.3844, 429.8908, 468.1262, 429.1549, 494.8878, 429.7946, 521.6362,
         429.5593, 521.6362, 447.1262, 494.8845, 447.1262, 468.1361, 447.1262,
         441.3844, 447.1262],
        [ 13.5964, 200.8588,  53.9444, 200.8588,  94.2891, 200.8588, 134.6372,
         200.8588, 136.2953, 224.0603,  94.6174, 223.8846,  52.9496, 224.7431,
[  9.6169, 129.5970,  50.4060, 129.5970,  91.1951, 129.5970, 131.9842,
         129.5970, 131.9842, 154.4557,  91.1951, 154.4557,  50.4060, 154.4557,
           9.6169, 154.4557],
        [413.8601, 363.6007, 449.7843, 363.6007, 485.7119, 363.6007, 521.6362,
         363.6007, 521.6362, 385.4765, 485.7119, 385.4765, 449.7843, 385.4765,
         413.8601, 385.4765],
        [839.9901, 487.2316, 868.7314, 487.2316, 897.4695, 487.2316, 926.2109,
         487.2316, 927.5374, 504.7985, 898.6866, 505.1200, 869.8291, 505.2924,
         840.9849, 506.1243],
        [ 12.2699,  95.7891,  51.2881,  95.3383,  90.3064,  94.7351, 129.3313,
          94.7947, 129.3313, 119.3220,  89.3148, 120.0877,  49.3017, 120.8666,
           9.2853, 121.6422],
        [ 12.2699, 166.3880,  51.5070, 165.1616,  90.7408, 163.7794, 129.9945,
         163.0734, 131.6526, 187.9322,  92.2994, 188.2570,  52.9429, 188.4227,
          13.5964, 189.2580],
        [459.9551, 494.1921, 481.2848, 493.7711, 502.6178, 493.4364, 523.9575,
         493.5292, 521.6362, 512.4219, 500.7442, 512.4219, 479.8522, 512.4219,
         458.9602, 512.4219],
        [443.3741, 451.1036, 469.4526, 450.3678, 495.5510, 451.0108, 521.6362,
         450.7721, 522.6310, 468.3390, 495.5477, 468.6638, 468.4611, 468.8329,
         441.3844, 469.6648],
        [446.3587, 473.3107, 471.4490, 472.8567, 496.5392, 472.2567, 521.6362,
         472.3164, 523.9575, 489.8832, 498.8639, 489.8832, 473.7736, 489.8832,
         448.6801, 489.8832],
        [583.6488, 428.5650, 589.9496, 428.5650, 596.2504, 428.5650, 602.5511,
         428.5650, 606.8621, 447.1262, 598.8204, 446.3970, 590.7189, 447.0201,
         582.6540, 446.7947],
        [153.8711, 320.8437, 174.9852, 320.8437, 196.0961, 320.8437, 217.2102,
         320.8437, 219.8632, 342.0565, 197.2037, 342.0565, 174.5408, 342.0565,
         151.8813, 342.0565],
        [400.2637,  43.7514, 466.2558,  43.7514, 532.2479,  43.7514, 598.2401,
          43.7514, 598.2401,  83.5254, 532.2479,  83.5254, 466.2558,  83.5254,
         400.2637,  83.5254],
        [162.8248, 197.8757, 195.1012, 197.8757, 227.3810, 197.8757, 259.6574,
         197.8757, 259.6574, 224.0603, 227.3810, 224.0603, 195.1012, 224.0603,
         162.8248, 224.0603],
        [582.9856, 408.3465, 590.5034, 408.3465, 598.0179, 408.3465, 605.5356,
         408.3465, 605.5356, 424.5876, 598.0179, 424.5876, 590.5034, 424.5876,
         582.9856, 424.5876],
        [584.3121, 448.4520, 591.3855, 448.4520, 598.4622, 448.4520, 605.5356,
         448.4520, 605.5356, 467.3446, 598.4622, 467.3446, 591.3855, 467.3446,
         584.3121, 467.3446],
        [585.3069, 469.6648, 592.4931, 469.6648, 599.6760, 469.6648, 606.8621,
         469.6648, 606.8621, 489.8832, 599.6760, 489.8832, 592.4931, 489.8832,
         585.3069, 489.8832],
        [586.6334, 493.5292, 594.1512, 493.5292, 601.6656, 493.5292, 609.1835,
         493.5292, 609.1835, 509.7702, 601.6656, 509.7702, 594.1512, 509.7702,
         586.6334, 509.7702],
        [689.4352, 407.0207, 710.3271, 407.0207, 731.2191, 407.0207, 752.1111,
         407.0207, 752.1111, 424.5876, 731.2191, 424.5876, 710.3271, 424.5876,
         689.4352, 424.5876],
        [692.0881, 427.2392, 712.5391, 427.2392, 732.9866, 427.2392, 753.4376,
         427.2392, 753.4376, 444.8060, 732.9866, 444.8060, 712.5391, 444.8060,
         692.0881, 444.8060],
        [694.4095, 447.1262, 712.7579, 447.1262, 731.1097, 447.1262, 749.4582,
         447.1262, 749.4582, 466.0188, 731.1097, 466.0188, 712.7579, 466.0188,
         694.4095, 466.0188],
        [695.7360, 469.6648, 713.6434, 469.6648, 731.5508, 469.6648, 749.4582,
         469.6648, 749.4582, 485.9059, 731.5508, 485.9059, 713.6434, 485.9059,
         695.7360, 485.9059],
        [859.8871, 400.7232, 872.3793, 400.7232, 884.8680, 400.7232, 897.3601,
         400.7232, 897.3601, 419.6158, 884.8680, 419.6158, 872.3793, 419.6158,
         859.8871, 419.6158],
        [862.5401, 424.5876, 875.8049, 424.5876, 889.0696, 424.5876, 902.3344,
         424.5876, 902.3344, 442.1544, 889.0696, 442.1544, 875.8049, 442.1544,
         862.5401, 442.1544],
        [862.5401, 444.8060, 875.0322, 444.8060, 887.5209, 444.8060, 900.0131,
         444.8060, 900.0131, 462.0414, 887.5209, 462.0414, 875.0322, 462.0414,
         862.5401, 462.0414],
        [862.5401, 464.6930, 875.0322, 464.6930, 887.5209, 464.6930, 900.0131,
         464.6930, 900.0131, 484.5800, 887.5209, 484.5800, 875.0322, 484.5800,
         862.5401, 484.5800]]), text: tensor([[ 359, 6307, 5360,  ..., 6830, 6830, 6830],
        [ 143, 1970, 6288,  ..., 6830, 6830, 6830],
        [  15,   16,   18,  ..., 6830, 6830, 6830],
        ...,
        [4305, 5701, 6830,  ..., 6830, 6830, 6830],
        [4305, 5701, 6830,  ..., 6830, 6830, 6830],
        [4305, 5701, 6830,  ..., 6830, 6830, 6830]], dtype=torch.int32)])}]

and error occurred:

/opt/conda/conda-bld/pytorch_1587428398394/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
        nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
        nonzero(Tensor input, *, bool as_tuple)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
cuda runtime error (700) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCCachingHostAllocator.cpp:278
Traceback (most recent call last):
  File "tools/train_net.py", line 243, in <module>
    args=(args,),
  File "/data/projects/detectron2/detectron2/engine/launch.py", line 57, in launch
    main_func(*args)
  File "tools/train_net.py", line 231, in main
    return trainer.train()
  File "tools/train_net.py", line 113, in train
    self.train_loop(self.start_iter, self.max_iter)
  File "tools/train_net.py", line 102, in train_loop
    self.run_step()
  File "/data/projects/detectron2/detectron2/engine/train_loop.py", line 217, in run_step
    print(loss_dict)
  File "/opt/conda/lib/python3.7/site-packages/torch/tensor.py", line 162, in __repr__
    return torch._tensor_str._str(self)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor_str.py", line 315, in _str
    tensor_str = _tensor_str(self, indent)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor_str.py", line 213, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor_str.py", line 88, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered

stan-haochen commented 4 years ago

We haven't tested it on Chinese datasets. Please make sure you have enough GPU memory.

haoran1062 commented 4 years ago

We haven't tested it on Chinese datasets. Please make sure you have enough GPU memory.

But never CUDA out of memory, and I just use one 1080Ti and batch size = 1, I see the GPU memory only used 800MB when occurred the error of RuntimeError: CUDA error: an illegal memory access was encountered

Yuliang-Liu commented 4 years ago

@haoran1062 NUM_CHARS represents the max length could be possible exist in one instance, which should impossible be 6900.

If you want to change the number of the classes, change MODEL.BATEXT.VOC_SIZE. Also pay attention to the class of the "EOF" symbol.

haoran1062 commented 4 years ago

@haoran1062 NUM_CHARS represents the max length could be possible exist in one instance, which should impossible be 6900.

If you want to change the number of the classes, change MODEL.BATEXT.VOC_SIZE. Also pay attention to the class of the "EOF" symbol.

Thanks, that's my bad... now this error fixed, but another error occured:


Traceback (most recent call last):                                                                                                                                                                                                                                                [6/1819]
  File "tools/train_net.py", line 243, in <module>
    args=(args,),
  File "/data/projects/detectron2/detectron2/engine/launch.py", line 57, in launch
    main_func(*args)
  File "tools/train_net.py", line 231, in main
    return trainer.train()
  File "tools/train_net.py", line 113, in train
    self.train_loop(self.start_iter, self.max_iter)
  File "tools/train_net.py", line 102, in train_loop
    self.run_step()
  File "/data/projects/detectron2/detectron2/engine/train_loop.py", line 209, in run_step
    data = next(self._data_loader_iter)
  File "/data/projects/detectron2/detectron2/data/common.py", line 142, in __iter__
    for d in self.dataset:
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/projects/detectron2/detectron2/data/common.py", line 41, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/data/projects/detectron2/detectron2/utils/serialize.py", line 23, in __call__
    return self._obj(*args, **kwargs)
  File "/data/projects/AdelaiDet-master/adet/data/dataset_mapper.py", line 94, in __call__
    raise e
  File "/data/projects/AdelaiDet-master/adet/data/dataset_mapper.py", line 91, in __call__
    image, transforms = T.apply_transform_gens(self.tfm_gens, image)
  File "/data/projects/detectron2/detectron2/data/transforms/transform_gen.py", line 535, in apply_transform_gens
    tfm = g.get_transform(img) if isinstance(g, TransformGen) else g
  File "/data/projects/detectron2/detectron2/data/transforms/transform_gen.py", line 251, in get_transform
    newh = int(newh + 0.5)
ValueError: cannot convert float NaN to integer

but I checked the image, w&h not zero.

Yuliang-Liu commented 4 years ago

@haoran1062 This issue might happen if you have illegal annotations. Also make sure you use the latest version of this project.

haoran1062 commented 4 years ago

@haoran1062 This issue might happen if you have illegal annotations. Also make sure you use the latest version of this project.

I found this error cause of the crop function, I set crop_gen = False, then I can continue training the model. And 6900 cls is too large that cause CUDA OOM. But I found that if using one card OOM could continue training, but multi-gpu will hanging.

Yuliang-Liu commented 4 years ago

@haoran1062 Glad to hear that you can train now.

Just a reminder, we have tested crop_gen function before, it should work. Gotta be something wrong somewhere.

aim-uofa / AdelaiDet

cuda runtime error (700) : an illegal memory access was encountered #97