Open saicharithpasula opened 2 years ago
Could be a PyTorch version related issue. Which version of PyTorch are you using? Another possibility is that your labels are not in the correct shape
You have to manage to connect the pre-trained network to a header. Check out the head_helper.py and this thread https://github.com/facebookresearch/SlowFast/issues/371 where @yurinishikawa proposes an insightful header. Also, check the paper to see what output to expect from this pre-trained network.
Hello,
I am getting this error when I am trying to train a X3D model using AVA dataset.
File "tools/run_net.py", line 45, in
main()
File "tools/run_net.py", line 26, in main
launch_job(cfg=cfg, init_method=args.init_method, func=train)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/daemon/code/Users/saicharithreddy.pasula/SCOUT-%20Behavior%20Anomaly%20Detection/trainer/SlowFast/slowfast/utils/misc.py", line 296, in launch_job
torch.multiprocessing.spawn(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/daemon/code/Users/saicharithreddy.pasula/SCOUT-%20Behavior%20Anomaly%20Detection/trainer/SlowFast/slowfast/utils/multiprocessing.py", line 60, in run ret = func(cfg) File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/daemon/code/Users/saicharithreddy.pasula/SCOUT-%20Behavior%20Anomaly%20Detection/trainer/SlowFast/tools/train_net.py", line 711, in train train_epoch( File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/daemon/code/Users/saicharithreddy.pasula/SCOUT-%20Behavior%20Anomaly%20Detection/trainer/SlowFast/tools/train_net.py", line 156, in train_epoch loss = loss_fun(preds, labels) File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1164, in forward return F.cross_entropy(input, target, weight=self.weight, File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/functional.py", line 3014, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) TypeError: cross_entropy_loss(): argument 'input' (position 1) must be Tensor, not list
It looks like the model is returning the predictions as a python list instead of a pytorch tensor. Did anyone encounter this error?
P.S: The config I am using is
TRAIN: ENABLE: True DATASET: ava BATCH_SIZE: 16 EVAL_PERIOD: 1 CHECKPOINT_PERIOD: 1 AUTO_RESUME: True CHECKPOINT_FILE_PATH: 'checkpoints/x3d_l.pyth' CHECKPOINT_TYPE: pytorch CHECKPOINT_EPOCH_RESET: True CHECKPOINT_INFLATE: False MIXED_PRECISION: False
DATA: NUM_FRAMES: 15 SAMPLING_RATE: 6 TRAIN_JITTER_SCALES: [256, 320] TRAIN_CROP_SIZE: 224 TEST_CROP_SIZE: 224 INPUT_CHANNEL_NUM: [3] DECODING_BACKEND: torchvision
DETECTION: ENABLE: True ALIGNED: True
AVA: FRAME_DIR: ‘path/frames' FRAME_LIST_DIR: ‘path/frames_list' ANNOTATION_DIR: ‘path/annotations' DETECTION_SCORE_THRESH: 0.8 TRAIN_PREDICT_BOX_LISTS: [ "ava_train_v2.2.csv", "person_box_67091280_iou90/ava_detection_train_boxes_and_labels_include_negative_v2.2.csv", ] TEST_PREDICT_BOX_LISTS: ["person_box_67091280_iou90/ava_detection_val_boxes_and_labels.csv"]
X3D: WIDTH_FACTOR: 2.0 DEPTH_FACTOR: 2.2 BOTTLENECK_FACTOR: 2.25 DIM_C5: 2048 DIM_C1: 12
RESNET: ZERO_INIT_FINAL_BN: True TRANS_FUNC: x3d_transform STRIDE_1X1: False
BN: USE_PRECISE_STATS: False NUM_BATCHES_PRECISE: 200
SOLVER: BASE_LR: 0.1 BASE_LR_SCALE_NUM_SHARDS: True LR_POLICY: steps_with_relative_lrs STEPS: [0, 10, 15, 20] LRS: [1, 0.1, 0.01, 0.001] MAX_EPOCH: 1 WEIGHT_DECAY: 1e-7 WARMUP_EPOCHS: 5.0 WARMUP_START_LR: 0.000125 OPTIMIZING_METHOD: sgd
MODEL: NUM_CLASSES: 7 ARCH: x3d MODEL_NAME: X3D LOSS_FUNC: cross_entropy DROPOUT_RATE: 0.5 HEAD_ACT: sigmoid
TEST: ENABLE: False DATASET: ava BATCH_SIZE: 1
DATA_LOADER: NUM_WORKERS: 5 PIN_MEMORY: True
NUM_GPUS: 2 NUM_SHARDS: 1 RNG_SEED: 0 OUTPUT_DIR: ./x3d