facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
9.11k stars 811 forks source link

RuntimeError: Input type (c10::Half) and bias type (float) should be the same #319

Open zshn25 opened 11 months ago

zshn25 commented 11 months ago

Error when trying to run training

  File "/dinov2/dinov2/layers/patch_embed.py", line 75, in forward
    x = self.proj(x)  # B C H W
  File "/dinov2/dinov2/models/vision_transformer.py", line 211, in prepare_tokens_with_masks
    x = self.patch_embed(x)
  File "/dinov2/dinov2/models/vision_transformer.py", line 254, in forward_features
    x = self.prepare_tokens_with_masks(x, masks)
  File "/dinov2/dinov2/models/vision_transformer.py", line 321, in forward
    ret = self.forward_features(*args, **kwargs)
  File "/dinov2/dinov2/train/ssl_meta_arch.py", line 160, in get_teacher_output
    teacher_backbone_output_dict = self.teacher.backbone(x, is_training=True)
  File "/dinov2/dinov2/train/ssl_meta_arch.py", line 229, in forward_backward
    teacher_dino_softmaxed_centered_list, masked_teacher_ibot_softmaxed_centered = get_teacher_output()
  File "/dinov2/dinov2/train/train.py", line 246, in do_train
    loss_dict = model.forward_backward(data, teacher_temp=teacher_temp)
  File "/dinov2/dinov2/train/train.py", line 314, in main
    do_train(cfg, model, resume=not args.no_resume)
  File "/dinov2/dinov2/run/train/train.py", line 29, in __call__
    train_main(self.args)
  File "/dinov2/dinov2/run/train/train.py", line 60, in main
    t()
  File "/dinov2/dinov2/run/train/train.py", line 65, in <module>
    sys.exit(main())
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Setup exactly as mentioned in README.

zshn25 commented 11 months ago

self.proj was at different dtype as the input. I added self.proj.to(x) at this line and this resolved https://github.com/facebookresearch/dinov2/blob/da4b3825f0ed64b7398ace00c5062503811d0cff/dinov2/layers/patch_embed.py#L75 but I now get another RuntimeError

Expected output.scalar_type() == at::ScalarType::Half to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 181, in get_attn_bias_and_cat
    cat_tensors = index_select_cat([x.flatten(1) for x in x_list], branges).view(1, -1, x_list[0].shape[-1])
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 201, in drop_add_residual_stochastic_depth_list
    attn_bias, x_cat = get_attn_bias_and_cat(x_list, branges)
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 227, in forward_nested
    x_list = drop_add_residual_stochastic_depth_list(
  File "/home/zsuri/prototyping_dinov2/dinov2/layers/block.py", line 259, in forward
    return self.forward_nested(x_or_x_list)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 40, in forward
    x = b(x)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 241, in forward_features_list
    x = blk(x)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 260, in forward_features
    return self.forward_features_list(x, masks)
  File "/home/zsuri/prototyping_dinov2/dinov2/models/vision_transformer.py", line 329, in forward
    ret = self.forward_features(*args, **kwargs)
  File "/home/zsuri/prototyping_dinov2/dinov2/train/ssl_meta_arch.py", line 235, in forward_backward
    student_global_backbone_output_dict, student_local_backbone_output_dict = self.student.backbone(
  File "/home/zsuri/prototyping_dinov2/dinov2/train/train.py", line 246, in do_train
    loss_dict = model.forward_backward(data, teacher_temp=teacher_temp)
  File "/home/zsuri/prototyping_dinov2/dinov2/train/train.py", line 314, in main
    do_train(cfg, model, resume=not args.no_resume)
  File "/home/zsuri/prototyping_dinov2/dinov2/run/train/train.py", line 29, in __call__
    train_main(self.args)
  File "/home/zsuri/prototyping_dinov2/dinov2/run/train/train.py", line 60, in main
    t()
  File "/home/zsuri/prototyping_dinov2/dinov2/run/train/train.py", line 65, in <module>
    sys.exit(main())
RuntimeError: Expected output.scalar_type() == at::ScalarType::Half to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
qasfb commented 11 months ago

did you cast the model to .half() ?

zshn25 commented 11 months ago

@qasfb, I had to manually cast particular layers to same dtype as the input in multiple places