bjin / mpv-prescalers

prescalers for mpv, as user shaders
GNU Lesser General Public License v3.0
355 stars 34 forks source link

[Suggestion] Use MAE instead of MSE to train RAVU #56

Closed Artoriuz closed 1 year ago

Artoriuz commented 1 year ago

I saw that you've switched to MSE in an attempt to allow the kernels to focus on reconstructing high-frequency information, which naturally end up being the outliers which MSE is supposed to take care of better, but in practice this has been empirically proven to be generally wrong: https://arxiv.org/pdf/1511.08861.pdf

The network the authors trained with MAE beat the one they trained with MSE even on MSE itself, which illustrates my point.

I'm not sure if RAVU technically not being a CNN is supposed to make a difference here, but it probably doesn't. MAE is simply more stable.

There are a few other tricks that are generally worth trying, like pairing MAE with DSSIM, or adding another component to the loss function using the images after a sobel filter (their edges), but MSE is probably not the answer.

Just my two cents on the topic, if you have a good reason to be using MSE that I'm just unaware of please feel free to disregard.

Congrats on the new shaders by the way, they look awesome =)

bjin commented 1 year ago

RAVU uses a simple linear regression model. Unlike CNN, it has no activation function. Therefore, it should be pretty straightforward, training for MSE will minimize MSE, and training for MAE will minimize MAE. I actually checked the visualized trained kernels and there is just a difference in strength, while the shape is basically same. The reason I choosed MSE is that it can be trained faster, and it looks sharper without considering the ringings (which will be fixed by the new ar filter).

While tweaking the target cost function may bring some improvement, most of it will be erased by the new anti-ringing filter anyway. The sad truth is, the upperlimit of ravu is quite clear, and in 2023 I see no reason to put a lot of effort to improve it further, rather than training an actual CNN model with pytorch from scratch.