ViCCo-Group / thingsvision

Python package for extracting representations from state-of-the-art computer vision models
https://vicco-group.github.io/thingsvision/
MIT License
157 stars 21 forks source link

Dinov2 #149

Closed LukasMut closed 1 year ago

Alxmrphi commented 1 year ago

I had a bit of trouble getting this to work, but I'm not sure how connected it is to this specific commit or not. I cloned the master branch as this had already been merged and tried to load some of the new dinov2-vit* models. I think because I had to step away from thingsvision for a bit, coming back gives me the experience of a more inexperienced user and the issues one might face (useful for testing).

First of all, noticed DINOV2 wasn't on the ReadMe and wasn't sure what the source was, so copying example on ReadMe was default torchvision and had to look in extractors.py to see it was the hub. Then I realised I didn't know the keyword for that as it's not specified in the ReadMe (and perhaps should be, comment in the example for what to use to reference different model sources).

As torchvision is used in the ReadMe example, I left that in accidentally, had errors of missing libraries that I had to install manually (keras_applications and validators) then the error saying dinov2-vit* models were not in torchvision and another one saying hub didn't exist as a source. This is confusing in the installation instructions. I had a peek in helpers.py and saw the options I could give there and realised it was ssl and it was a dumb mistake. So, I think we don't need to reference vssl or torch.hub in those instructions and leave the only option to a new user that the keyword is ssl. Come to think of it, I don't think there is ever a case where a model would be possible to select from different sources so I don't know why the user should specify it. It seems only to introduce a chance for things to go wrong. The mapping in a dictionary could be implemented internally within the code.

The other models seem to be working, but I am getting errors saying it can't find any of the dino-v2 models in the SSLExtractor even as I'm copying the keys from MODELS dictionary directly and it seems it should work.

I am leaving for the weekend shortly and can't dig deeper, but just wanted to provide this feedback before I left in case it was useful.

LukasMut commented 1 year ago

I had a bit of trouble getting this to work, but I'm not sure how connected it is to this specific commit or not. I cloned the master branch as this had already been merged and tried to load some of the new dinov2-vit* models. I think because I had to step away from thingsvision for a bit, coming back gives me the experience of a more inexperienced user and the issues one might face (useful for testing).

First of all, noticed DINOV2 wasn't on the ReadMe and wasn't sure what the source was, so copying example on ReadMe was default torchvision and had to look in extractors.py to see it was the hub. Then I realised I didn't know the keyword for that as it's not specified in the ReadMe (and perhaps should be, comment in the example for what to use to reference different model sources).

As torchvision is used in the ReadMe example, I left that in accidentally, had errors of missing libraries that I had to install manually (keras_applications and validators) then the error saying dinov2-vit* models were not in torchvision and another one saying hub didn't exist as a source. This is confusing in the installation instructions. I had a peek in helpers.py and saw the options I could give there and realised it was ssl and it was a dumb mistake. So, I think we don't need to reference vssl or torch.hub in those instructions and leave the only option to a new user that the keyword is ssl. Come to think of it, I don't think there is ever a case where a model would be possible to select from different sources so I don't know why the user should specify it. It seems only to introduce a chance for things to go wrong. The mapping in a dictionary could be implemented internally within the code.

The other models seem to be working, but I am getting errors saying it can't find any of the dino-v2 models in the SSLExtractor even as I'm copying the keys from MODELS dictionary directly and it seems it should work.

I am leaving for the weekend shortly and can't dig deeper, but just wanted to provide this feedback before I left in case it was useful.

@Alxmrphi, I am not sure I follow. Could you be more specific/concise about the issue?