Make CCD good for higher resolutions

DomBito commented 3 years ago

Hello, there.

One clear flaw of CCD is that it only works great for SD footage. There's a reason it's done using only 16 pixels within a 25x25. The kernel is sparse for efficiency, since conditional convotion is more costy compared to a regular one. But the "radius" of the matrix being 12 is probably something originally obtained by Sergei after testing things out, I'd assume. Chroma noise from either digital artifact or film grain has a relative size to the resolution of the video. If this radius is too small relative to the resolution, the convolution would be done inside a chroma spotch, and the chroma would't be denoised properly. The same holds true for radii that are too big, since it would try to convolute areas with completely different colors that would not be included in the average with the central pixel.

Today we don't have too much chroma noise from digital artifacts, but we sure do have a lot of film remaster being done. Hence why a CCD adapted to also work on higher resolutions would be nice. I believe a solution for that must be left to the user's choice, since there're inherent differences between the size of the noise depending, for example, if the footage is from a 8 mm, 16 mm, or 35 mm film.

With that being said, my ideia is to make a scale variable. A nice way of doing this is to allow it to assume any possible positive integer. The scale for the original CCD would be 4 (set as the standard), and your main for loop would look like this: for (int dy = y - 3*scale; dy <= y + 3*scale; dy += 2*scale) So, to get similar-to-SD results on Full HD and 4K versions of the same footage, one should choose scale=9 and scale=18, respectfully.

End-of-Eternity commented 3 years ago

Hey @DomBito, Scrad and I had actually already been discussing how best to implement this.

The issue behind the v0.2 release was caused mainly by the python wrapper not upscaling subsampled chroma to 444, rather downscaling to 420. This was great for CPU time since it quarters the total amount of work to do, without significantly altering the algorithm - however it also means that the matrix is two times larger than it is meant to be. an easy fix, therefore, would be to reduce the matrix size by exactly 1/2 for 420 subsampled input. It also would make sense to support more odd subsampling such as 411 and 410 in this way, since I've seen a few older VHS captures in this format.

Therefore, the plan is currently to allow the user to specify their own matrix sizes, and sample positions within that matrix. This would be quite flexible, and great for testing what kinds of matrices work with what sorts of content. An initial implementation for this is avaliable on the variable-matrices branch. It currently seems to segfault - probably something very simple that I've missed - though it should be working soon.

Do let me know if you think of any other ideas. I also tried a very basic random sampling method, instead of fixed points, to try to negate "ghosting" type errors, though unfortunately it hasn't yet outputted any frames. Maybe I'll try that again later.

DomBito commented 3 years ago

Subsampling

I think you mean downscaling it to 444 with the chroma resolution instead of upscaling to 444 with luma resolution, right? Since RGB is equivalent to YUV444 in terms of amount of information.

But, see, I really think it's best to leave it at luma resolution before applying the filter, because it is altering the algorithm. Even though you are only changing the chroma, you are using luma information to tell what is noise and what is not. That's why it uses RGB instead of just chroma distance alone. By downscaling the luma, you are making the RGB image softer and allowing more pixels in the average. This might cause chroma bleed or what you're calling ghosting.

Also, as you know, by making everything 444 with luma resolution, you just have to worry about downscaling it back to the original subsampling. So no need to work on ultra special kernels for VHS captures and everything works as expected out-of-the-box.

Variable Matrices

The idea of variable matrices is quite nice. I actually thought of this as well. Of course, with half of the variables, because since I believe luma resolution is the way to go, I'd only use square matrices. But the way you guys are doing it allows users too much freedom to do some useless things if they don't understand what they're doing, such as taking the average between the central pixel and another single one within a big matrix, if the offset is too big. Addressing this with errors would not be user-friendly as well.

So here's another way of doing this. This assumes keeping the luma resolution and gives a little bit more freedom than just the scale option I gave above, but not as much as your original idea for variable matrices.

Set two more positive integer variables: radius and scale. (This scale variable here can be interpret as an offset too) The variable radius is the 'relative radius' of the active pixels. This variable is exactly half of the dimesion of the square matrix composed of the kernel without the central pixel. The dimesion of this matrix is always an even number in this case. The variable scale similarly to what I described above, and makes the whole matrix both bigger and more sparse, equally. The for loop would turn out like this: for (int dy = y - (2*radius-1)*scale; dy <= y + (2*radius-1)*scale; dy += 2*scale) The original CCD would have radius=2 and scale=4. One nice option for default values would be radius=2 and scale=int(clip.height/120), which would just scale things up according to the resolution without changing the speed of each convolution.

Random sampling

I don't think this would help much. You won't have the same uniform chroma bleed, but the expected amount of bleeding is statistically the same, so you can expect parts with less chroma bleed, but also some others with more bleed than a non-random sampling. This can create a form of artifact. The same would happen to the efficiency of the denoising itself. Some pixels would get better convolution by change, but some others would do worse. I can be wrong though and maybe this problem is negligeable compared to the benefits of it. I'd appreciate if you let me know when you get a proper output and comparisons.

Another big issue that is hard to tackle but could improve this filter a lot

CCD is also known to destroy reds. That means more bleed on these colors. The problem with RGB is that it's not a visually uniform color space, and VS doesn't natively support these color spaces. A solution for this would be to not use the Euclidian distance, but an adaptation for that, taking into account the convertion of RGB into one of these visually uniform spaces. The problem is that it's not easy to simplify things and the new code would definitely be way less efficient. An ugly solution for this problem is finding the hues for which the bleed/ghosting is stronger (like the reds) and lower the threshold accordingly.

End-of-Eternity / vs-ccd