Closed aaiguy closed 1 year ago
any thought on this @woctezuma ?
This is due to the center-cropping operation:
I don't know how it would affect accuracy to resize to 224 instead of resizing to 256 and then center-cropping to 224 resolution.
However, depending on the task, it could make sense. If you look at DINO (the first version):
See:
Yes,its affecting expeciallt when image objects are big or inverted, is it possible to resize image to 224 without loosing any portion of object in image and quality ? just doing resize to 224 will shrink the object
I haven't experimented with this yet but the embedding interpolation should allow for non-square images. So the centercrop is not necessary, given initial image of H W and patch size of 14 you could crop to (H - H%14) (W - W%14) and feed that as input to the network. (could also resize before doing that crop if you want less patches).
as u can see output of preprocessed image some portion of image get cropped which is affecting the output result as the some portion of image information getting lost, is there a way to avoid this without loosing any image information while doing image preprocessing without affecting accuracy?
You are correct, the preprocessing does do that. The effect of this can be positive or negative depending on many factors, we did not really tweak this honestly. My guess would be that it's good only on object-centric datasets such as Imagenet.
You can make the crop less aggressive by doing the resize to 224, as @woctezuma suggested
If you want to keep all of the image, you can either:
T.Resize((224, 224))
, losing the aspect ratio of the original image. If the aspect ratio change is too aggressive, it might impair performance.It's hard to say which would be best. You'd have to test to see what works best for you. In any case, I don't think it should affect performance that much.
Closing as answered, please re-open if you need to discuss this more.
@TimDarcet I have similar issues as OP. All of my images are rectangular and I found that, after visualizing the features via PCA, that rectangle images don't work well and even resizing the images explicitly does not work well either. I tried padding to a square image and it also does pretty poorly (you can barely see any objects with these conditions). What consistently works well seems to be what is used with ImageNet (resize with bicubic interp., center crop, and using ImageNet norms).
However, my issue is that I need all of the image (even the details at the end) because my images are medical images so it has lots of important info all around the image. Do you have a recommended preprocessing method to deal with rectangular images while still preserving edge details?
Let me know any recommendations cc @woctezuma
Hi, could you give a bit more detail? A few questions:
A few insights:
All my images are of size (640,368) mainly because they are actually video frames taken from 360p videos of cells/protein structures/etc..
I use this transform:
H = 640
W = 368
patch_size = 14
newH = H - H%patch_size
newW = W - W%patch_size
print(f"New widths and heights are {newW,newH}")
transform = T.Compose([
T.Resize((newH, newW), interpolation=T.InterpolationMode.BICUBIC),
T.ToTensor(),
T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
And I get this visualization like this with PCA (n=3):
When I use this transform (center crop aka square image):
H = 640
W = 368
patch_size = 14
newH = H - H%patch_size
newW = W - W%patch_size
print(f"New widths and heights are {newW,newH}")
transform = T.Compose([
T.Resize((newH, newW), interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(newW),
T.ToTensor(),
T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
I get something like this:
With the center transform, I can clearly see the objects. In fact, I'm pretty impressed because it's able to get features for basically all the objects. But with the original rectangular image, it is a bit blurry and there is some weird color distortion going on (bottom half is very light)
When I visualize an attention head I can see that DINOv2 is paying attention to various features (I can see objects clearly in some images) with those rectangular images. Here is an example of one attention map:
Even though rectangular images are OOD it seems the attention mechanisms suggest the model's internal representations are still valuable.
This is the root cause of my confusion. PCA visualization does not give good results, attention results seem decent, so I am not sure if the model is giving useful representations overall or not for my data.
Center cropping does not make sense for my use case because valuable info is contained everywhere. My end purpose is that I am hoping to do cell-based segmentation.
And yeah I tested padding and it does not do well at all. I've narrowed down to using ImageNet norms (performs decently well for medical actually) and just resizing with no cropping.
But with the original rectangular image, it is a bit blurry and there is some weird color distortion going on (bottom half is very light)
The PCA you have seems okay to me, it's just that you are only visualizing the first component here. Showing (separately) a few other components (eg the first 10) should show the structure you are looking for.
I am not sure if the model is giving useful representations overall or not for my data.
As you said, the attmap looks okay, so the model is doing something. I think the representation should have some value.
I'm using the Dinov2 model to extract features from images before passing them to the model. For preprocessing the images, I'm following the same procedure as used during the training of the Dinov2 pretrained model on the Imagenet dataset, as shown in the code below
output preprocessing image
as u can see output of preprocessed image some portion of image get cropped which is affecting the output result as the some portion of image information getting lost, is there a way to avoid this without loosing any image information while doing image preprocessing without affecting accuracy?