ActiveVisionLab / DFNet

DFNet: Enhance Absolute Pose Regression with Direct Feature Matching (ECCV 2022)
https://dfnet.active.vision
MIT License
94 stars 9 forks source link

Question about testing one single-image #18

Closed wappints closed 8 months ago

wappints commented 8 months ago

I have a custom dataset where I want to test it on a single monocular image.

Is this possible?

chenusc11 commented 8 months ago

Hi, yes. Absolutely. As long as your train set is not a single image :)

wappints commented 8 months ago

Ok got it!

While I was trying to setup an inference code, I got this error:

import torch from feature.dfnet import DFNet_s as PoseNet3 model = PoseNet3() device = torch.device('cpu') model.load_state_dict(torch.load('/Users/.../DFNetInference/pretrain_models/kings/dfnetdm/checkpoint-0267-17.1446.pt', map_location=device))


RuntimeError Traceback (most recent call last) ---> model.load_state_dict(torch.load('/Users/.../DFNetInference/pretrain_models/kings/dfnetdm/checkpoint-0267-17.1446.pt', map_location=device)) RuntimeError: Error(s) in loading state_dict for DFNet_s: Unexpected key(s) in state_dict: "adaptation_layers.adapt_layer_1.0.weight", "adaptation_layers.adapt_layer_1.0.bias", "adaptation_layers.adapt_layer_1.2.weight", "adaptation_layers.adapt_layer_1.2.bias", "adaptation_layers.adapt_layer_1.3.weight", "adaptation_layers.adapt_layer_1.3.bias" ...

and so on. Apologies, I may have misunderstood. Is this the right model? Or I really have to do the training first regardless of PT model? Initially, I thought maybe I could just directly plug one of the models given from the README link. If it's also available, may I know if I could have access to the already trained models? Big thanks!

chenusc11 commented 8 months ago

Hi, sorry for the confusion. I think our released model is using

from feature.dfnet import DFNet as PoseNet.

DFNet_s is my legacy experiments. If you want to use this, you need to retrain a new model :)

wappints commented 8 months ago

Thank you, that fixed it :) I encountered another error:


File c:\Users\anaconda3\envs\DFNet\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs) 1126 # If we don't have any hooks, we want to skip the rest of the logic in 1127 # this function, and just call forward. 1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks ... 780 if stride is None: 781 stride = torch.jit.annotate(List[int], []) --> 782 return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

RuntimeError: CUDA out of memory. Tried to allocate 2.86 GiB (GPU 0; 8.00 GiB total capacity; 19.19 GiB already allocated; 0 bytes free; 19.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.


Is it because I am using an RTX 2080 instead of an RTX3080?

chenusc11 commented 8 months ago

It looks like a it is a GPU OOM issue. Are you trying to run the direct feature matching?

I think our model could run on 1080ti GPU. As described in our paper, we try to render a smaller image of (i.e. 60x80), then upsample it to (i.e. 240x320), thus it can fit a smaller GPU.

https://github.com/ActiveVisionLab/DFNet/blob/2c8fa7e324f8d17352ed469a8b793e0167e4c592/script/feature/direct_feature_matching.py#L341-L349

wappints commented 8 months ago

Okay, I was able to run an inference, I had to downsample my input image size.

tensor([[[ 51.3828, 8.2054, 61.8599, 432.9546], [ -5.0028, 65.6912, -28.1830, 79.2922], [-40.4008, 17.4475, 81.9363, 203.4155]]], device='cuda:0', grad_fn=)

Just to clarify, this is a 3x4 projection matrix correct? (rotation and translation).

chenusc11 commented 8 months ago

Sure, it looks like it. Why is the translation part so large?