gidariss / FewShotWithoutForgetting

MIT License
519 stars 110 forks source link

Error when training #4

Open xwjabc opened 6 years ago

xwjabc commented 6 years ago

When executing the command below: CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128CosineClassifier

It prompts:

Exception KeyError: KeyError(<weakref at 0x7f619db132b8; to 'tqdm' at 0x7f619db23090>,) in <bound method tqdm.__del__ of
  0%|                                                                 | 0/2000 [00:00<?, ?it/s]> ignored
Traceback (most recent call last):
  File "train.py", line 110, in <module>
    algorithm.solve(dloader_train, dloader_test)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/Algorithm.py", line 286, in solve
    eval_stats = self.evaluate(data_loader_test)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/Algorithm.py", line 330, in evaluate
    eval_stats_this = self.evaluation_step(batch)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 84, in evaluation_ste
p
    return self.process_batch(batch, do_train=False)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 87, in process_batch
    process_type = self.set_tensors(batch)
  File "/teamscratch/msravcshare/v-weijxu/code/few-shot/DynamicFewShot/algorithms/FewShot.py", line 60, in set_tensors
    nKnovel = 1 + labels_train.max() - self.nKbase
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'other'

Environment: Python 2.7 PyTorch 0.4 @ CUDA 9.1

caiqi commented 6 years ago

@xwjabc I met the same problem and I'm not familiar with pytorch. But change this line https://github.com/gidariss/FewShotWithoutForgetting/blob/master/algorithms/FewShot.py#L55 to self.nKbase = nKbase.squeeze()[0].cuda() fix the problem.

xwjabc commented 6 years ago

@caiqi Thx! Will take a look. I am also a newbie to PyTorch and trying to trace the reason of that error.

xwjabc commented 6 years ago

Got the reason. In PyTorch 0.4, x.squeeze()[0] will not return a scalar, but a tensor. It will cause several compatibility problems (e.g. nKbase errors, DAverageMeter errors). Will post a patch list later.

jin-s13 commented 6 years ago

@xwjabc I met possibly the same DAverageMeter error (AccuracyNovel is missing). Could you please tell me how to fix it?

xwjabc commented 6 years ago

@jin-s13 Could you add some more details for the error information?

bugrabaran commented 5 years ago

@jin-s13 My suggestion; if you are still interested, is you should add .item() at the end of top1accuracy() function whenever you calculate Accuracies for Novel, Base or Both this will turn the loss_record into a scalar for the aforementioned accuracies

Franklin-Yao commented 5 years ago

Here is my solution:

#labels_train = self.tensors['labels_train']

nKnovel = 1 + labels_train.max() - self.nKbase

labels_train_1hot_size = list(labels_train.size()) + [nKnovel,] labels_train_unsqueeze = labels_train.unsqueeze(dim=labels_train.dim()) self.tensors['labels_train1hot'].resize(labels_train_1hotsize).fill(0).scatter_( len(labels_train_1hot_size) - 1, (labels_train_unsqueeze - self.nKbase).cuda(), 1)