eladhoffer / convNet.pytorch

ConvNet training using pytorch
MIT License
347 stars 88 forks source link

RuntimeError: view size is not compatible with input tensor's size and stride #22

Closed gakadam closed 3 years ago

gakadam commented 3 years ago

I have the following config file:

{
    "adapt_grad_norm": null,
    "autoaugment": false,
    "batch_size": 256,
    "chunk_batch": 1,
    "config_file": null,
    "cutmix": null,
    "cutout": false,
    "dataset": "imagenet",
    "datasets_dir": "~/data/",
    "device": "cuda",
    "device_ids": [
        0
    ],
    "dist_backend": "nccl",
    "dist_init": "env://",
    "distributed": false,
    "drop_optim_state": false,
    "dtype": "float",
    "duplicates": 1,
    "epochs": 90,
    "eval_batch_size": -1,
    "evaluate": null,
    "grad_clip": -1,
    "input_size": null,
    "label_smoothing": 0,
    "local_rank": -1,
    "loss_scale": 1,
    "lr": 0.1,
    "mixup": null,
    "model": "alexnet",
    "model_config": "",
    "momentum": 0.9,
    "optimizer": "SGD",
    "print_freq": 10,
    "results_dir": "./results",
    "resume": "",
    "save": "alexnet_unquant",
    "save_all": false,
    "seed": 123,
    "start_epoch": -1,
    "sync_bn": false,
    "tensorwatch": false,
    "tensorwatch_port": 0,
    "weight_decay": 0,
    "workers": 8,
    "world_size": -1
}

I get the following error:

Starting Epoch: 1

Traceback (most recent call last):
  File "main.py", line 364, in <module>
    main()
  File "main.py", line 130, in main
    main_worker(args)
  File "main.py", line 306, in main_worker
    train_results = trainer.train(train_data.get_loader(),
  File "/MyPath/convNet.pytorch/trainer.py", line 269, in train
    return self.forward(data_loader, training=True, average_output=average_output, chunk_batch=chunk_batch)
  File "/MyPath/convNet.pytorch/trainer.py", line 224, in forward
    prec1, prec5 = accuracy(output, target, topk=(1, 5))
  File "/MyPath/convNet.pytorch/utils/meters.py", line 70, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I get the same error when I try the following command in README.rd:

python main.py --model resnet --model-config "{'depth': 18, 'quantize':True}" --save resnet18_8bit -b 64

How to rectify this?

Thanks.

gakadam commented 3 years ago

Solved.

replaced view by reshape.