Closed ardofski closed 2 years ago
Thanks for your precious work !
I am trying to run test on kinetics400 dataset but I get the following error:
[02/28 13:01:03][INFO] test_net.py: 158: Test with config: [02/28 13:01:03][INFO] test_net.py: 159: AVA: ANNOTATION_DIR: /mnt/vol/gfsai-flash3-east/ai-group/users/haoqifan/ava/frame_list/ BGR: False DETECTION_SCORE_THRESH: 0.9 EXCLUSION_FILE: ava_val_excluded_timestamps_v2.2.csv FRAME_DIR: /mnt/fair-flash3-east/ava_trainval_frames.img/ FRAME_LIST_DIR: /mnt/vol/gfsai-flash3-east/ai-group/users/haoqifan/ava/frame_list/ FULL_TEST_ON_VAL: False GROUNDTRUTH_FILE: ava_val_v2.2.csv IMG_PROC_BACKEND: cv2 LABEL_MAP_FILE: ava_action_list_v2.2_for_activitynet_2019.pbtxt TEST_FORCE_FLIP: False TEST_LISTS: ['val.csv'] TEST_PREDICT_BOX_LISTS: ['ava_val_predicted_boxes.csv'] TRAIN_GT_BOX_LISTS: ['ava_train_v2.2.csv'] TRAIN_LISTS: ['train.csv'] TRAIN_PCA_EIGVAL: [0.225, 0.224, 0.229] TRAIN_PCA_EIGVEC: [[-0.5675, 0.7192, 0.4009], [-0.5808, -0.0045, -0.814], [-0.5836, -0.6948, 0.4203]] TRAIN_PCA_JITTER_ONLY: True TRAIN_PREDICT_BOX_LISTS: [] TRAIN_USE_COLOR_AUGMENTATION: False BENCHMARK: LOG_PERIOD: 100 NUM_EPOCHS: 5 SHUFFLE: True BN: NORM_TYPE: batchnorm NUM_BATCHES_PRECISE: 200 NUM_SPLITS: 1 NUM_SYNC_DEVICES: 1 USE_PRECISE_STATS: False WEIGHT_DECAY: 0.0 DATA: AUTO_AUGMENT: COLOR_JITTER: 0.0 CROP_SIZE: 224 DECODING_BACKEND: pyav DEIT_TRANSFORMS: False ENSEMBLE_METHOD: sum INPUT_CHANNEL_NUM: [3] INV_UNIFORM_SAMPLE: False MEAN: [0.45, 0.45, 0.45] MULTI_LABEL: False NUM_FRAMES: 8 PATH_LABEL_SEPARATOR: PATH_PREFIX: PATH_TO_DATA_DIR: /home/users/agoktogan/Desktop/datasets/kinetics400-tiny RANDOM_FLIP: True REVERSE_INPUT_CHANNEL: False RE_PROB: 0.0 SAMPLING_RATE: 32 STD: [0.225, 0.225, 0.225] TARGET_FPS: 30 TEMPORAL_EXTENT: 8 TEST_CROP_SIZE: 224 TRAIN_CROP_SIZE: 224 TRAIN_JITTER_SCALES: [256, 320] DATA_LOADER: ENABLE_MULTI_THREAD_DECODE: False NUM_WORKERS: 8 PIN_MEMORY: True DEMO: BUFFER_SIZE: 0 CLIP_VIS_SIZE: 10 COMMON_CLASS_NAMES: ['watch (a person)', 'talk to (e.g., self, a person, a group)', 'listen to (a person)', 'touch (an object)', 'carry/hold (an object)', 'walk', 'sit', 'lie/sleep', 'bend/bow (at the waist)'] COMMON_CLASS_THRES: 0.7 DETECTRON2_CFG: COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml DETECTRON2_THRESH: 0.9 DETECTRON2_WEIGHTS: detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl DISPLAY_HEIGHT: 0 DISPLAY_WIDTH: 0 ENABLE: False FPS: 30 GT_BOXES: INPUT_FORMAT: BGR INPUT_VIDEO: LABEL_FILE_PATH: NUM_CLIPS_SKIP: 0 NUM_VIS_INSTANCES: 2 OUTPUT_FILE: OUTPUT_FPS: -1 PREDS_BOXES: SLOWMO: 1 STARTING_SECOND: 900 THREAD_ENABLE: False UNCOMMON_CLASS_THRES: 0.3 VIS_MODE: thres WEBCAM: -1 DETECTION: ALIGNED: True ENABLE: False ROI_XFORM_RESOLUTION: 7 SPATIAL_SCALE_FACTOR: 16 DIST_BACKEND: nccl EMA: ENABLED: False GLOBAL_BATCH_SIZE: 64 LOG_MODEL_INFO: False LOG_PERIOD: 10 MIXUP: ALPHA: 0.8 CUTMIX_ALPHA: 1.0 CUTMIX_MINMAX: None ENABLED: False MODE: batch PROB: 1.0 SWITCH_PROB: 0.5 MODEL: ARCH: vit DROPCONNECT_RATE: 0.0 DROPOUT_RATE: 0.5 FC_INIT_STD: 0.01 HEAD_ACT: softmax LOSS_FUNC: cross_entropy MODEL_NAME: vit_base_patch16_224 MULTI_PATHWAY_ARCH: ['slowfast'] NUM_CLASSES: 400 SINGLE_PATHWAY_ARCH: ['c2d', 'i3d', 'slow', 'x3d'] MULTIGRID: BN_BASE_SIZE: 8 DEFAULT_B: 0 DEFAULT_S: 0 DEFAULT_T: 0 EPOCH_FACTOR: 1.5 EVAL_FREQ: 3 LONG_CYCLE: False LONG_CYCLE_FACTORS: [(0.25, 0.7071067811865476), (0.5, 0.7071067811865476), (0.5, 1), (1, 1)] LONG_CYCLE_SAMPLING_RATE: 0 SHORT_CYCLE: False SHORT_CYCLE_FACTORS: [0.5, 0.7071067811865476] NONLOCAL: GROUP: [[1], [1], [1], [1]] INSTANTIATION: dot_product LOCATION: [[[]], [[]], [[]], [[]]] POOL: [[[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]]] NUM_GPUS: 1 NUM_SHARDS: 1 OUTPUT_DIR: ./timesformer_kinetics400_tiny_workdir RESNET: DEPTH: 50 INPLACE_RELU: True NUM_BLOCK_TEMP_KERNEL: [[3], [4], [6], [3]] NUM_GROUPS: 1 SPATIAL_DILATIONS: [[1], [1], [1], [1]] SPATIAL_STRIDES: [[1], [2], [2], [2]] STRIDE_1X1: False TRANS_FUNC: bottleneck_transform WIDTH_PER_GROUP: 64 ZERO_INIT_FINAL_BN: False RNG_SEED: 0 SHARD_ID: 0 SLOWFAST: ALPHA: 8 BETA_INV: 8 FUSION_CONV_CHANNEL_RATIO: 2 FUSION_KERNEL_SZ: 5 SOLVER: BASE_LR: 0.005 BASE_LR_SCALE_NUM_SHARDS: False COSINE_END_LR: 0.0 DAMPENING: 0.0 GAMMA: 0.1 LRS: [1, 0.1, 0.01] LR_POLICY: steps_with_relative_lrs MAX_EPOCH: 15 MOMENTUM: 0.9 NESTEROV: True OPTIMIZING_METHOD: sgd STEPS: [0, 11, 14] STEP_SIZE: 1 WARMUP_EPOCHS: 0.0 WARMUP_FACTOR: 0.1 WARMUP_START_LR: 0.01 WEIGHT_DECAY: 0.0001 TENSORBOARD: CATEGORIES_PATH: CLASS_NAMES_PATH: CONFUSION_MATRIX: ENABLE: False FIGSIZE: [8, 8] SUBSET_PATH: ENABLE: False HISTOGRAM: ENABLE: False FIGSIZE: [8, 8] SUBSET_PATH: TOPK: 10 LOG_DIR: MODEL_VIS: ACTIVATIONS: False COLORMAP: Pastel2 ENABLE: False GRAD_CAM: COLORMAP: viridis ENABLE: True LAYER_LIST: [] USE_TRUE_LABEL: False INPUT_VIDEO: False LAYER_LIST: [] MODEL_WEIGHTS: False TOPK_PREDS: 1 PREDICTIONS_PATH: WRONG_PRED_VIS: ENABLE: False SUBSET_PATH: TAG: Incorrectly classified videos. TEST: BATCH_SIZE: 8 CHECKPOINT_FILE_PATH: /home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/weights/TimeSformer_divST_8x32_224_K400.pyth CHECKPOINT_TYPE: pytorch DATASET: kinetics ENABLE: True NUM_ENSEMBLE_VIEWS: 1 NUM_SPATIAL_CROPS: 3 SAVE_RESULTS_PATH: TIMESFORMER: ATTENTION_TYPE: divided_space_time PRETRAINED_MODEL: TRAIN: AUTO_RESUME: True BATCH_SIZE: 8 CHECKPOINT_CLEAR_NAME_PATTERN: () CHECKPOINT_EPOCH_RESET: False CHECKPOINT_FILE_PATH: CHECKPOINT_INFLATE: False CHECKPOINT_PERIOD: 5 CHECKPOINT_TYPE: pytorch DATASET: kinetics ENABLE: False EVAL_PERIOD: 5 FINETUNE: False X3D: BN_LIN5: False BOTTLENECK_FACTOR: 1.0 CHANNELWISE_3x3x3: True DEPTH_FACTOR: 1.0 DIM_C1: 12 DIM_C5: 2048 SCALE_RES2: False WIDTH_FACTOR: 1.0 [02/28 13:01:22][INFO] checkpoint.py: 221: Loading network weights from /home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/weights/TimeSformer_divST_8x32_224_K400.pyth. [02/28 13:01:33][INFO] kinetics.py: 71: Constructing Kinetics test... [02/28 13:01:33][INFO] kinetics.py: 111: Constructing kinetics dataloader (size: 12) from /home/users/agoktogan/Desktop/datasets/kinetics400-tiny/test.csv [02/28 13:01:33][INFO] test_net.py: 170: Testing model for 2 iterations Traceback (most recent call last): File "tools/run_net.py", line 44, in <module> main() File "tools/run_net.py", line 33, in main launch_job(cfg=cfg, init_method=args.init_method, func=test) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/timesformer/utils/misc.py", line 300, in launch_job func(cfg=cfg) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/tools/test_net.py", line 197, in test test_meter = perform_test(test_loader, model, test_meter, cfg, writer) File "/home/users/agoktogan/anaconda3/envs/timesformer/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/tools/test_net.py", line 96, in perform_test preds = model(inputs) File "/home/users/agoktogan/anaconda3/envs/timesformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/timesformer/models/vit.py", line 334, in forward x = self.model(x) File "/home/users/agoktogan/anaconda3/envs/timesformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/timesformer/models/vit.py", line 303, in forward x = self.forward_features(x) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/timesformer/models/vit.py", line 292, in forward_features x = blk(x, B, T, W) File "/home/users/agoktogan/anaconda3/envs/timesformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/timesformer/models/vit.py", line 139, in forward res_spatial = self.drop_path(self.attn(self.norm1(xs))) File "/home/users/agoktogan/anaconda3/envs/timesformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/users/agoktogan/Desktop/transformer-fodc/TimeSformer/timesformer/models/vit.py", line 79, in forward attn = (q @ k.transpose(-2, -1)) * self.scale RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
The conent of test.csv is as follows:
test.csv
/home/users/agoktogan/Desktop/datasets/kinetics400-tiny/videos/---q6ElFyVq0.mp4 0 /home/users/agoktogan/Desktop/datasets/kinetics400-tiny/videos/---q6ElFyVq0.mp4 1 /home/users/agoktogan/Desktop/datasets/kinetics400-tiny/videos/---q6ElFyVq0.mp4 2 /home/users/agoktogan/Desktop/datasets/kinetics400-tiny/videos/---q6ElFyVq0.mp4 3
My GPU is GeForce GTX 1080 Ti and driver is 470.63.01
How can I solve this CUBLAS_STATUS_EXECUTION_FAILED problem ?
Thanks in advance
When I installed old version of pytorch and cuda, problem solved: conda install pytorch=1.8.1 torchvision cudatoolkit=10.2 -c pytorch
conda install pytorch=1.8.1 torchvision cudatoolkit=10.2 -c pytorch
Thanks for your precious work !
I am trying to run test on kinetics400 dataset but I get the following error:
The conent of
test.csv
is as follows:My GPU is GeForce GTX 1080 Ti and driver is 470.63.01
How can I solve this CUBLAS_STATUS_EXECUTION_FAILED problem ?
Thanks in advance