facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
9.12k stars 812 forks source link

Extend to 3D images #28

Closed NimaMojtahedi closed 1 year ago

NimaMojtahedi commented 1 year ago

I would like to ask you if there is possibility to modify the code to feed 3D images? if not, do you have a plan to extend the code to 3D images? Thanks, Nima

ccharest93 commented 1 year ago

You could slice your 3d image into 2d images and feed them in, out of the box you wouldn't have information flowing btw the slices so the features wouldn't be continuous. It is definitely something that could be done in post processing though

woctezuma commented 1 year ago

Not sure if good features for 2D natural images would be good features for 3D medical images.

ccharest93 commented 1 year ago

Oh I didn't see that it was medical images. Since IMNET22k doesn't have medical images to my knowledge, I doubt they made it into the curated dataset.

I guess a solution for that would be to fine tune the model on a dataset of 2d medical images. Even then not sure if the embeddings (btw natural and medical) are similar enough to work

woctezuma commented 1 year ago

I am not sure if it is medical. I assumed it based on the bio of the profile which mentions "bioinformatician" and "neuroscientist".

NimaMojtahedi commented 1 year ago

Thank you for all replies. Yes, my intention is to use it on medical images. I will try on 2D slices and do post-processing. Is there possibility to train this model on custom data?

pierrefdz commented 1 year ago

I would like to ask you if there is possibility to modify the code to feed 3D images? if not, do you have a plan to extend the code to 3D images? Thanks, Nima

If you want to do that, you can adapt code from video recognition tasks (video is treated as T x 2d images). Usually they only change the patch embedding part from changing 2D patches into 3D patches, like in https://github.com/baaivision/EVA/blob/master/EVA-01/video/models/clip_mae.py#L238

Otherwise you could take all 2D embeddings, concatenate or average them, add a linear layer or a more complex head, and it should already give good results.

ketil-malde commented 1 year ago

I doubt the pretrained models will be useful for medical images, the features they know seem heavily based on real-world photos. I played with the demo a bit, and segmentation works well (incredibly well, in fact) on random photos, but fails spectacularly on underwater images - nothing very special, just a photo containing seabed, seaweed, fish. Training DINO v1 on whole video frames didn't get me anywhere, but I suspect the augmentation methods assume there's a main object in each image. I've used DINO v1 to classify plankton images with moderate success, and will try to retrain DINO v2 as well. Feel free to get in touch if you have ideas on how to best retrain, or if want more details.

NimaMojtahedi commented 1 year ago

I would like to ask you if there is possibility to modify the code to feed 3D images? if not, do you have a plan to extend the code to 3D images? Thanks, Nima

If you want to do that, you can adapt code from video recognition tasks (video is treated as T x 2d images). Usually they only change the patch embedding part from changing 2D patches into 3D patches, like in https://github.com/baaivision/EVA/blob/master/EVA-01/video/models/clip_mae.py#L238 Thanks for this great hint. Otherwise you could take all 2D embeddings, concatenate or average them, add a linear layer or a more complex head, and it should already give good results. I try this idea. Thank you!

patricklabatut commented 1 year ago

Closing as answered, thanks for your interest!

Yuki-Suprise commented 6 months ago

May I ask if you have already done this? Can you tell me how? Thank you very much.