Closed NimaMojtahedi closed 1 year ago
You could slice your 3d image into 2d images and feed them in, out of the box you wouldn't have information flowing btw the slices so the features wouldn't be continuous. It is definitely something that could be done in post processing though
Not sure if good features for 2D natural images would be good features for 3D medical images.
Oh I didn't see that it was medical images. Since IMNET22k doesn't have medical images to my knowledge, I doubt they made it into the curated dataset.
I guess a solution for that would be to fine tune the model on a dataset of 2d medical images. Even then not sure if the embeddings (btw natural and medical) are similar enough to work
I am not sure if it is medical. I assumed it based on the bio of the profile which mentions "bioinformatician" and "neuroscientist".
Thank you for all replies. Yes, my intention is to use it on medical images. I will try on 2D slices and do post-processing. Is there possibility to train this model on custom data?
I would like to ask you if there is possibility to modify the code to feed 3D images? if not, do you have a plan to extend the code to 3D images? Thanks, Nima
If you want to do that, you can adapt code from video recognition tasks (video is treated as T x 2d images). Usually they only change the patch embedding part from changing 2D patches into 3D patches, like in https://github.com/baaivision/EVA/blob/master/EVA-01/video/models/clip_mae.py#L238
Otherwise you could take all 2D embeddings, concatenate or average them, add a linear layer or a more complex head, and it should already give good results.
I doubt the pretrained models will be useful for medical images, the features they know seem heavily based on real-world photos. I played with the demo a bit, and segmentation works well (incredibly well, in fact) on random photos, but fails spectacularly on underwater images - nothing very special, just a photo containing seabed, seaweed, fish. Training DINO v1 on whole video frames didn't get me anywhere, but I suspect the augmentation methods assume there's a main object in each image. I've used DINO v1 to classify plankton images with moderate success, and will try to retrain DINO v2 as well. Feel free to get in touch if you have ideas on how to best retrain, or if want more details.
I would like to ask you if there is possibility to modify the code to feed 3D images? if not, do you have a plan to extend the code to 3D images? Thanks, Nima
If you want to do that, you can adapt code from video recognition tasks (video is treated as T x 2d images). Usually they only change the patch embedding part from changing 2D patches into 3D patches, like in https://github.com/baaivision/EVA/blob/master/EVA-01/video/models/clip_mae.py#L238 Thanks for this great hint. Otherwise you could take all 2D embeddings, concatenate or average them, add a linear layer or a more complex head, and it should already give good results. I try this idea. Thank you!
Closing as answered, thanks for your interest!
May I ask if you have already done this? Can you tell me how? Thank you very much.
I would like to ask you if there is possibility to modify the code to feed 3D images? if not, do you have a plan to extend the code to 3D images? Thanks, Nima