Open mranzinger opened 8 months ago
Hi there,
Thanks for your interest in our work! I also noticed your RADIO before and I like it very much!
Currently, we don't have a dedicated script for visualization. The original feature maps of Figure 1 are cropped from our visualization logs, similar to this image: https://github.com/Jiawei-Yang/Denoising-ViT/raw/main/demo/demo_outputs/dinov2_base_cat.jpg
To visualize this you can refer to sample_scripts/stage1_denoising.sh
, but you have to modify the checkpoint loading part, which is at https://github.com/Jiawei-Yang/Denoising-ViT/blob/adeff838169152a6e55bd8e3d7f1f1befe006ff2/DenoisingViT/vit_wrapper.py#L104
Another reference for visualizing PCA maps is at: https://github.com/Jiawei-Yang/Denoising-ViT/blob/adeff838169152a6e55bd8e3d7f1f1befe006ff2/denoise_single_image.py#L105
The last reference will be the most useful one and it is independent of the majority of denoising code, allowing easy copy-paste use for your own codebase.
Best, Jiawei
Thank you Jiawei! I will try to give this a go this week.
I finally got around to implementing this based on your code. I ran it on the following models: DFN CLIP at 378px DINOv2 at 224, 378, 518px RADIOv1 at 378px RADIOv2 at 432, 512, 1024px
Results in subsequent messages.
Looks like GitHub isn't allowing me to upload more. All of the visualizations can be found here: https://drive.google.com/drive/folders/1xsmcT515n78LALV0mm1kA4hGT63zZM12?usp=sharing
I think the visualizations at 1024px are rather fascinating, as it appears as though RADIOv2 switches to "SAM mode" at that resolution, and pays close attention to contours, and object-parts are more clearly encoded.
Amazing visualizations! Thanks for providing these!
Re SAM --- Yes! Under the 1024px, I found the pattern to be exactly the SAM patterns. I visualized SAM at the very beginning of the project, using 518px resolution. Here is what I got at that time:
But we didn't include SAM in our final released codebase and paper because it's not a standard ViT and requires more hacks to the timm
package to make it compatible with other functionalities.
RADIOv1 seems to be very noisy and v2 is more cleaner. I will have a detailed look-through in a week. BTW, the PCA visualizations are over-blurred to me. I guess you first upsample the features then do PCA? I think doing PCA at low resolution and upsampling the color map will give you more crisp results.
BTW, the PCA visualizations are over-blurred to me. I guess you first upsample the features then do PCA? I think doing PCA at low resolution and upsampling the color map will give you more crisp results.
So I would take an image, interpolate it to the model resolution (e.g. 378, 432, etc.). From the model, we get $(H/p,W/p)$ spatial features, with $p$ being the patch size, and $H$, $W$ being the interpolated resolution. I then computed the PCA on the $(H/p,W/p)$ maps (e.g. I didn't upsample first). Finally, I upsample the PCA maps back to the input resolution for the model (e.g. upsample by factor $p$).
Is that the process your recommending, or is there a better algorithm?
Ah, then upsampling the color map using nearest interpolation becomes the key? Bilinear interpolation will result in a blurry?
Yep. I suppose so. I'll have to spend some more time with this.
Some more fun with these visualizations. Top-left is the original image, top-right is RADIO's backbone representation, bottom-left is RADIO's SAM head, and bottom-right is SAM.
I'm learning quite a bit through your work (thanks!). In particular, I think RADIOv2 has less noise than the other encoders, however, when looking at images with large regions of roughly uniform color (e.g. the gymnast), the position encoding noise becomes quite apparent. So next up is to get your denoiser working on RADIO to see how much further the output features can be refined.
Hello and thank you for your excellent work!
My group recently released AM-RADIO, and I'd really like to run the same set of experiments to see if distilling from multiple ViTs with different training regimes amplifies, suppresses, or changes the artifacts. So, the first step would be to generate your Figure 1 Original images, and then to explore training the denoiser on top of it.
Could you point me to where/how to run your visualization scripts? I poked around in a couple places and couldn't find the magic command.
Thanks!