Open arianaa30 opened 3 months ago
Thanks for opening the issue! I was busy last week. I'm not sure how we should compare the pytorch model with the webgpu shaders, since they are both derived from the glsl shaders. I might recommend rendering a anime image in MPV with the stock glsl shaders and treat it as the baseline. I will look into how.
Also, since there are more than one scripts method in the issue mentioned, which one did you use to generate the pytorch model? Would you mind sharing the notebook w/ model? We might be able to compare the weights, etc.
I used these pytorch models folks contributed here: bloc97/Anime4K#220
To make sure the webGPU version reproduces the original glsl shaders, I first compared the upscaled results of MPV and webGPU, using this image and the upscale CNN 2x UL shader. The steps are:
CTRL+s screenshot-to-file mpv_2x.png window
in input.conf
to dump windows as image. Then view the image with mpv --image-display-duration=inf --no-hidpi-window-scale --glsl-shaders="~~/shaders/Anime4K_Upscale_CNN_x2_UL.glsl" --geometry=1280x720 Magia_360p.png
. Use CTRL + s
keys to dump the output.These two images are compared using this script. They are visually indistinguishable, with SSIM > 0.9995 and MSE < 0.05. Considering the precision issues, the webGPU implementation should be largely aligned with the original GLSL shader.
I also took a look at the PyTorch models here and they incorporates more processing stages like AutoDownscalePre
and ClampHighlight
. They have significant impact on the output, if you tried the pre-defined shader combos defined in anime4k's MPV release. You may try to run inference on anime4k
module instead of the whole Anime4KPipeline
to see if the result differs.
Interesting repo. I measured the SSIM of your CNNx2UL (downloaded the Canvas on web demo), and realized it is much lower (0.77) than that of the PyTorch-converted model (0.97) in https://github.com/bloc97/Anime4K/issues/220 . I assume they should be very equivalent. I actually measured the upscale-VL on PyTorch but I assume UL should even be higher quality score.
What could be the discrepency here? The SSIM difference is too large.