IndexError while trying to train

ajayrfhp commented 5 years ago

python train.py --name clean_10000 --dataroot ./datasets/clean_10000/ --no_instance --batchSize 5 --label_nc 0 --resize_or_crop none

Error

File "train.py", line 87, in <dictcomp>
errors = {k: v.data[0] if not isinstance(v, int) else v for k, v in loss_dict.items()}
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

ecidon commented 5 years ago

We are getting the same issue when trying to retrain from a checkpoint:

command:

export CUDA_VISIBLE_DEVICES=1,2,3 && python train.py --name label2city_1024p --netG local --ngf 32 --load_pretrain checkpoints/label2city_1024p/ --resize_or_crop scale_width --loadSize 1024 --fineSize 1024 --label_nc 0 --dataroot ./datasets/CamVid/ --gpu_ids 0,1,2 --batchSize 3 --no_instance`

error

Traceback (most recent call last):
File "train.py", line 87, in <module>
    errors = {k: v.data[0] if not isinstance(v, int) else v for k, v in loss_dict.items()}
File "train.py", line 87, in <dictcomp>
    errors = {k: v.data[0] if not isinstance(v, int) else v for k, v in loss_dict.items()}
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

ecidon commented 5 years ago

We changed v.data[0] in line 87 in train.py, to v.item(). It seems to have solved the problem but we can update when the training is over.

ajayrfhp commented 5 years ago

That's what I did too and it worked. Speculating that it could be a pytorch version thing. No idea.

NVIDIA / pix2pixHD

IndexError while trying to train #97