Ujjawal-K-Panchal / coshnet

47 stars 5 forks source link

Porting to RGB images and beyond. #4

Closed alexjc closed 1 year ago

alexjc commented 2 years ago

I saw a link to your paper on Twitter and have been digging into the code! I noticed there are placeholders for RGB and Lab support in various places already, but it doesn't seem complete.

What would be necessary to support 3 input channels and beyond? Also, what if the images are more than 32x32 — do you recommend using a patch-based representation like ViT?

(I guess this is a likely future path for the research, so if you're working on it that's very interesting!)

Ujjawal-K-Panchal commented 2 years ago

Hi @alexjc! I'm glad you noticed the placeholder for colorspaces. Yes we are investigating how best to use CVNNs to classify natural images. Since most natural images are in color. Preliminary results do suggest using a good colorspace is very beneficial.

We are a big fan of patch-based approaches which is how almost all medical images and image processing are performed. ViT is very interesting work but its main focus is using attention to replace or learn convolution. That aspect we are not such big fans of since, they have to user a lot of model capacity and tons of data to eventually learn a kernel that has locality. Basically what a convolution already give you. This is opposite of our main thesis of using a 'hybrid network'. Naturally attention will be of great help and we have been looking at how to use complex domain to do attention.

To me the most pure insight for using patch is 'Deep Image Prior'. Another one is a paper we cite 'An Image Patch is a Wave: Phase-Aware Vision MLP'. The later is especially intriguing since CoShNet and CoShREM works in phase space too.

We always feel attention is really just a form of non-local means (denoising). The work of 'Non-Local Neural Networks' is something that might be of interest to you.

Ujjawal-K-Panchal commented 1 year ago

Closing the issue @alexjc. We can reopen if needed.