edongdongchen / REI

Robust Equivariant Imaging (REI), CVPR'2022 Oral
84 stars 8 forks source link

A question about the operator "A" and noise. #3

Closed XiaoqiangZhou closed 2 years ago

XiaoqiangZhou commented 2 years ago

Thanks for sharing this great work. Equivariant Imaging is a great idea. However, I have a few questions about the mechanism/motivation of this idea.

Yes, clean signals are hard to obtain in many cases. Therefore, it is meaningful to perform unsupervised learning from noisy and partial measurements. However, in this paper, the forward operator "A" and characteristics of the noise are assumed to be known. My questions are around this assumption.

1) Is this assumption difficult to hold for many real-world imaging tasks? For example, the downsampling strategy and blur type are unknown and hard to predict in real-world image super-resolution task. 2) If the forward operator "A" and characteristics of the noise are known, it is easy to synthesize or create the training pairs through the forward process. For example, we can collect a great set of high-resolution and clear images easily, and generate abundant training pairs for inpainting and SR. In such a situation, why unsupervised learning is necessary (instead of supervised learning) since I can get many desirable training pairs easily with the known forward operator "A" and characteristics of the noise?

Could you please help me with the above questions?

Thanks.

edongdongchen commented 2 years ago

Hello, please see my comments below.

Thanks for sharing this great work. Equivariant Imaging is a great idea. However, I have a few questions about the mechanism/motivation of this idea.

Yes, clean signals are hard to obtain in many cases. Therefore, it is meaningful to perform unsupervised learning from noisy and partial measurements. However, in this paper, the forward operator "A" and characteristics of the noise are assumed to be known. My questions are around this assumption.

  1. Is this assumption difficult to hold for many real-world imaging tasks? For example, the downsampling strategy and blur type are unknown and hard to predict in real-world image super-resolution task.

COMMENTS: This is an interesting question. In many real scientific imaging tasks, the forward process is generally known and ill-posed (e.g. accelerated MRI scanner). However, even if the ill-posed forward mode is known, solving inverse problems by direct learning to image from measurements alone is still impossible due to the difficulties of nullspace learning, the EI/REI mainly aims to solve this challenge.

If the forward process is partially known or totally unknown, in order to make such a problem identifiable, it is necessary and possible to have some additional information about the forward process. In the case of blind SR or blind deconvolution, we usually assume that the forward process is a kind of convolution and we must impose priors to the blur kernel and downsampling such that the problem is still identifiable by unsupervised learning.

  1. If the forward operator "A" and characteristics of the noise are known, it is easy to synthesize or create the training pairs through the forward process. For example, we can collect a great set of high-resolution and clear images easily, and generate abundant training pairs for inpainting and SR. In such a situation, why unsupervised learning is necessary (instead of supervised learning) since I can get many desirable training pairs easily with the known forward operator "A" and characteristics of the noise?

COMMENTS: This is a great question. In the natural image restoration tasks (e.g. SR, inpainting, or deblurring tasks), you are absolutely right that we can use the forward model to synthesize as many as possible training pairs {(x_syn, y_syn)} and use supervised training to build a perfect mapping f_syn between {x_syn} and {y_syn} -- it's relatively easy and pretty straightforward to play. The key issue is that training with the synthetic data pairs is not learning the true signal model of the real measurements of interests. This will even get worse if the synthetic data distribution is highly different to the real measurement distribution. For example, given a known 4x downsampled SR forward model A, we can easily synthesize as many as possible high-resolution natural images (digits, building, etc), and then we can train a perfect supervised net f to perform the 4x SR reconstruction task. It will absolutely work fantastic if the collected raw measurements Y_raw are also low-resolution natural images (e.g. buildings). However, it is not reasonable to use such an f to reconstruct the low-resolution images of one unknown protein molecule (e.g. covid virus, where its signal model is obviously unknown such that it is impossible to synthesize or create a great set of the groundtruth covid images -- they are definitely not natural images like buildings or faces...) even if both low-resolution image sets are generated by using the same 4x downsampling operator A.

We would like to emphasize again that the distribution of synthetic data P_syn and that of the raw measurements P_raw are different. Training with the synthetic data pairs can only build a data-driven prior to the synthetic signal model, it is not learning the true signal model of raw signals, i.e. the function learned on synthetic data can not reconstruct structures or patterns that are not present in the synthetic training pairs. However, if unsupervised learning (or 'learn to image from raw measurements') is possible, such unseen structures/patterns that are embedded in the raw measurements can be therefore figured out.

'Learning to image from raw measurements' is necessary and important. This requirement to 'learning without groundtruth' is crucial in many scientific inverse imaging scenarios that are beyond SR, deblurring or inpainting tasks on natural images.

In many scientific imaging scenarios (such as astronomical, bioimaging, and signal processing...), scientists can only have real measurements data Y_raw, the target is mainly to figure out the true structures/patterns in the raw measurements. For example, when the covid pandemic was coming, although instrument A (e.g. cryo-EM) is always there, we haven't the groundtruth covid protein's structure image (i.e. x) before biologists figure it out (i.e. unsupervised solving the inverse problem). Again, it is nonreasonable to synthesize an 'abundant' set of non-covid training pairs to do supervised learning and use that learned model to reconstruct the new-coming and unknown covid's structure image. It is also impossible to do your mentioned 'desirable' data augmentation by applying A on the groundtruth x (we don't have it! it is unknown). Finally, it is also meaningless to do data augmentation on Y_raw, which cannot bring any new information beyond the range space. In reality, biologists first used the cryo-EM to measure the virus' protein and then figured out the reconstruction of the virus' 3d structure image from its collected 2d raw measurements y_raw -- this reconstruction process, i.e. solving the Cryo-EM inverse imaging problem, is fully unsupervised.

The above points (there are more...) make 'learning without groundtruth' not only necessary but important. For more details, you can also check our latest theory paper on EI and unsupervised learning for inverse imaging problems.

Let me know if you still have doubts about 'learning to image from raw measurement is important'.

All the best.

XiaoqiangZhou commented 2 years ago

Thank you very very much for your detailed reply sincerely. I'd like to share with you my opinions here.

My initial motivation for the first question is that the degradations in real-world image restoration tasks (e.g., SR, deblurring, deraining) are often unknown and may include diverse&difficult patterns. The classical SR task assumes low-resolution images are obtained by bicubic downsampling directly, while the blind SR task extends the types of degradations to several categories. I agree with you that we must impose priors to the blur kernel and downsampling.

As for the second question, there is indeed a domain gap between synthetic data and real measurement, which is also the reason why many SR models fail on real-world low-resolution images (they are trained on synthetic data). Unsupervised learning and domain adaptation/generalization are topics that may be helpful to this problem. I learn from your introduction that there are many scenarios containing only real measurements data Y_raw. It verifies the importance of 'learning without groundtruth' task furthermore.

I read through the theory paper and get some inspiration from the high-quality content. I need more time to learn the idea of this paper totally, and I'm looking forward to your future work in this research field.

Best regards.