facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
9.21k stars 820 forks source link

ONNX error with DINOv2 with registers #288

Open isaacperez opened 1 year ago

isaacperez commented 1 year ago

I get the following error when I try to export the model to ONNX: torch.onnx.symbolic_registry.UnsupportedOperatorError: Exporting the operator ::_upsample_bicubic2d_aa to ONNX opset version 16 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

This is the code:

import torch 

class TempModel(torch.nn.Module): 
    def __init__(self, model): 
        super().__init__() 
        self.model = model  

    def forward(self, tensor): 
        features_dict = self.model.forward_features(tensor)
        return features_dict['x_norm_patchtokens']

model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitl14_reg')
temp_model = TempModel(model).to('cpu') 
temp_model.eval() 

with torch.no_grad():
    input_data = torch.randn(1, 3, 420, 420).to('cpu') 
    output = temp_model(input_data) 

    torch.onnx.export(temp_model, input_data, 'model.onnx', input_names = ['input'], opset_version=16)
isaacperez commented 1 year ago

It works with the original model: dinov2_vitl14

patricklabatut commented 1 year ago

@isaacperez This error should only happen with inputs that require interpolation (i.e. images whose size is not 518 x 518). It is due to the forced use of antialias=True in torch.nn.function.interpolate() which is not supported in ONNX export (see this PyTorch issue). There is a (very) hacky work-around in the same issue, alternatively we could also allow disabling this forced antialising.

patricklabatut commented 1 year ago

@isaacperez Meanwhile, if not using the PyTorch Hub entry points, one can also directly call the internally _make_dinov2_model() function from dinov2.hub.backbones with interpolate_antialias=False...

isaacperez commented 1 year ago

If I modified the antialias value in ~/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:

        patch_pos_embed = nn.functional.interpolate(
            patch_pos_embed.reshape(1, int(sqrt_N), int(sqrt_N), dim).permute(0, 3, 1, 2),
            scale_factor=(sx, sy),
            mode="bicubic",
            antialias=False#self.interpolate_antialias,
        )

Then the results are not the same as when it is True. The difference is significant.

qasfb commented 1 year ago

if you want to onnx-export for inference at a fixed resolution, i think you can interpolate the position embeddings in advance, with antialias, and not do it at run-time

isaacperez commented 1 year ago

I have tried it but the results are completely different (I don't understand why).