Documentation on scaling factor

Trinity3e commented 1 year ago

Hi, I tried setting SF to various values but apparently nothing happens, scale still kicks in, do I need to intervene somehow else?

//

If it works with chroma, can it be made aware of color space so it automatically doubles the scaling vs luma for most 4:2:2 formats? Cause KrigBilateral is broken in gpu-next and nlmeans can replace it this way (even without the bilateral luma-influenced scaling)

//

Also, offtopic, can RF=guided disable itself if the frame rate is high (cause it does take a performance hit then)?

LE2: Patch donut seems to work a bit in temporal, don't remove it yet :-)

AN3223 commented 1 year ago

SF only works well if it matches the scaling factor of the shader (i.e., the WIDTH and HEIGHT directives in the shader's HOOK block). For instance you could take nlmeans_2x.glsl and change WIDTH/HEIGHT to upscale 4x and set SF to 4. Sorry if this is convoluted, it's really the only way to implement it as far as I can tell.

For chroma un-subsampling I think you could remove the LUMA hooks and then set WIDTH & HEIGHT to LUMA.w & LUMA.h respectively, and then set the CHROMA SF to 2. This would have the downside of having an incorrect SF for other pixel formats though. You might want to see how KrigBilateral does it, maybe it has a better solution.

BTW you'll want to use nlmeans_2x.glsl for any upscaling stuff you want to experiment with. It has FSRCNNX injected into it and it uses its output as a guide.

Either way, the quality of the upscaling is really bad right now. It's much better to run NLM normally and then run an upscaler like FSRCNNX afterwards.

You can automatically pick shaders based on video characteristics using mpv's conditional auto profiles. For instance you could auto select nlmeans_lq.glsl whenever the FPS is above 30.

Message ID: @.***>

Trinity3e commented 1 year ago

Thanks for the clarifications. So nlmeans_2x is using FSRCNNX as RF instead of guided, is that right? How do I "compile" it, or will you add it as a standalone file later? I will overlook it's badder quality as I intend to use it as a second stage upscaler for low res. Speaking of RF, when using RF=guided which is default in a few variants, RF > 1 won't matter and will just treat it as enabled? // I will try and study the code of KrigBilateral to see at least why it broke in gpu-next, but with my (very) limited knowledge of glsl will take a while. Maybe when you have the time you can look if it's chroma bounds can help nlmeans in any way in taking hints from luma in the pre downscale stage.

AN3223 commented 1 year ago

So nlmeans_2x is using FSRCNNX as RF instead of guided, is that right? How do I "compile" it, or will you add it as a standalone file later?

No need to compile, every shader is ready to go!

I will overlook it's badder quality as I intend to use it as a second stage upscaler for low res.

I still don't think you will be impressed. The quality is worse than nlmeans+FSRCNNX and the performance is worse too since NLM is running at 2x the resolution. If you aren't unimpressed by the quality then I would like to know why lol.

Speaking of RF, when using RF=guided which is default in a few variants, RF > 1 won't matter and will just treat it as enabled?

Correct. The shader code itself only cares if RF is 0 or not (this has always been true). You're not the first person to be confused by this, I should probably clarify the documentation around RF.

I will try and study the code of KrigBilateral to see at least why it broke in gpu-next, but with my (very) limited knowledge of glsl will take a while.

You should leave a comment on the KrigBilateral gist about it, maybe igv or someone else there can be more helpful about it than me or you.

Maybe when you have the time you can look if it's chroma bounds can help nlmeans in any way in taking hints from luma in the pre downscale stage.

Maybe using luma as a guide image for chroma denoising in NLM could achieve something similar to KrigBilateral. I will try that out!

Trinity3e commented 1 year ago

I still don't think you will be impressed. The quality is worse than nlmeans+FSRCNNX and the performance is worse too since NLM is running at 2x the resolution. If you aren't unimpressed by the quality then I would like to know why lol.

Ooh, so since the scaler is injected in RF it's at the beginning of the chain so nlmeans works mostly with the big image rather than the original, now I get it, my bad :D

Correct. The shader code itself only cares if RF is 0 or not (this has always been true). You're not the first person to be confused by this, I should probably clarify the documentation around RF.

Yeah please, and on EP too, still don't fully understand it. And if you decide to keep the patch donut, document that further as well :-)

You should leave a comment on the KrigBilateral gist about it, maybe igv or someone else there can be more helpful about it than me or you.

I did, maybe one day he will answer

Maybe using luma as a guide image for chroma denoising in NLM could achieve something similar to KrigBilateral. I will try that out!

Exactly, i think it can make really a beautiful picture with no performance hit, even at original resolution with no extra processing it should correct the weird noisy chromas of cheap hd/old sd videos.

All right man, thanks a lot, i'll close this one to not go too off topic, will ask you about temporal a bit later. Have a good evening!

AN3223 commented 1 year ago

Correct. The shader code itself only cares if RF is 0 or not (this has always been true). You're not the first person to be confused by this, I should probably clarify the documentation around RF.

Yeah please, and on EP too, still don't fully understand it. And if you decide to keep the patch donut, document that further as well :-)

I'll keep this in mind, I have seen confusion around EP too. Patch donuts are just a weird thing I put in, I don't think they really have any use.

Maybe using luma as a guide image for chroma denoising in NLM could achieve something similar to KrigBilateral. I will try that out!

Exactly, i think it can make really a beautiful picture with no performance hit, even at original resolution with no extra processing it should correct the weird noisy chromas of cheap hd/old sd videos

I just now added nlmeans_lgc.glsl for this. I tested it on the test image from the description on the KrigBilateral gist, I just compressed it with ffmpeg -i image.png -vf format=yuv420p image.jpg

It kinda works but it's pretty blurry compared to Krig. Sharpening kinda helps, but amplifies the noise. Maybe some balance could be struck with Sharpen+Denoise? And it's probably bugged for non-yuv420p content. I'll play around with it some more tomorrow.

For chroma noise/speckles like you are describing I think we could already handle it by turning SW down, but this approach seems to work for that too.

Message ID: @.***>

Trinity3e commented 1 year ago

Oh wow, the colors look more natural and defined right away!

The noise seems to be drawn from the luma's noise cause i can see part of the macro blocking artefacts, bumping the denoising on the luma nlmeans seems to allow for more sharpening on it. I use denoise+sharpen for all my uses. :-)

And good news, i tried it with 420, 422 and 444 content and it's not bugged. For 444 it can probably accept easily a check if the chroma res is the same to skip processing.

LE: I stuck nlmeans_lgc at the bottom, with nlmeans_temporal + FSRCNNX at the front, it works, and the chroma is perfect, no noise at all and sharp!

I will try it more tomorrow myself and report back. Now i gotta go, it's a bit late. Really appreciate it.

Trinity3e commented 1 year ago

I tested it a bit more now, it seems to just work fine. Can the amount of luma taken into consideration be tweaked (like the ASP does with sharpened result)?

AN3223 commented 1 year ago

Update and try again, it should be much better now. It was applying maximum blur to one of the chroma channels before, that's why the denoising factor had to be turned down so much. I also implemented gather optimizations for it and enabled rotations+reflections. It's upscaling now too.

For 444 it can probably accept easily a check if the chroma res is the same to skip processing.

I would rather this be up to the user to do in mpv.conf via conditional auto profiles, especially since the effect may still be desirable on yuv444p content (it's doing more than just upscaling, it's denoising too).

I tested it a bit more now, it seems to just work fine. Can the amount of luma taken into consideration be tweaked (like the ASP does with sharpened result)?

Not really, luma is the only guide and there currently isn't any system to have multiple guides.

AN3223 commented 1 year ago

Testing it on another video, it looks like it's cranked up too much. I'll play with it again tomorrow.

Trinity3e commented 1 year ago

Sorry if i'm looking wrong but i see no commit today The color contrast was a tiny bit much, like saturation+3 maximum, but nothing too annoying

It was applying maximum blur to one of the chroma channels before, that's why the denoising factor had to be turned down so much. I also implemented gather optimizations for it and enabled rotations+reflections. It's upscaling now too.

Got it, reflections especially should make it really good i think

FSRCNNX on chroma beats Krig by a lot :)

AN3223 commented 1 year ago

Doh, I forgot to push. Check again!

Trinity3e commented 1 year ago

:-)

Yeah it does seem more accurate, almost correctly saturated (maybe a little left in highlights around edges), but it's near perfect now. It works still fine with different formats. I will include it in my config this evening pretty much already, the difference is again striking compared to simple nlmeans and cscale :)

Isn't S=10 a bit much? I set it to 4.0 (double the value on medium quality nlmeans) and a tiny bit of sharpen.

Do any of the luma values matter here or shoud I ignore?

I think EP should be on for chroma, enabled and no bad effect so far

LE: tiny bug at lines 371-375 it repeats a string :)

LE2: I will test the scaling last, possibly will need very powerful gpu, still will have a use in mpv converter where realtime rendering is not necessary, i am trying to build one in my scripts when Dudemanguy has some time free.

AN3223 commented 1 year ago

Isn't S=10 a bit much? I set it to 4.0 (double the value on medium quality nlmeans) and a tiny bit of sharpen.

Yes definitely. The best value seems to vary a lot depending on what you're looking at. I'm testing on low res animation now, it seems like a very low S value works best, like 0.125. For real world video it seems like the difference is very hard to see.

Do any of the luma values matter here or shoud I ignore?

Are you talking about the luma settings? Yeah they are unused in LGC since we're only hooking chroma.

I think EP should be on for chroma, enabled and no bad effect so far

EP is bugged for chroma right now, it should be using the downscaled luma plane instead of the chroma plane. I think it should be an easy fix though; I'll work on that now.

LE: tiny bug at lines 371-375 it repeats a string :)

It's repeated for emphasis, half of those options are for internal use only and the other half are generally not useful.

LE2: I will test the scaling last, possibly will need very powerful gpu, still will have a use in mpv converter where realtime rendering is not necessary, i am trying to build one in my scripts when Dudemanguy has some time free.

If you have ffmpeg compiled against libplacebo then you can run shaders in there if that's what you're talking about. The syntax is like ffmpeg -i your_input_file.ext -init_hw_device vulkan -vf hwupload,libplacebo=your_shader.glsl,hwdownload,format=yuv420p your_output_file.ext

AN3223 commented 1 year ago

Latest commit fixes EP for chroma. I only really tested to make sure luma EP is still the same. I briefly tried chroma EP but couldn't see a difference and I can't think of any test images that I have where chroma EP would be useful.

Trinity3e commented 1 year ago

Isn't S=10 a bit much? I set it to 4.0 (double the value on medium quality nlmeans) and a tiny bit of sharpen.

Yes definitely. The best value seems to vary a lot depending on what you're looking at. I'm testing on low res animation now, it seems like a very low S value works best, like 0.125. For real world video it seems like the difference is very hard to see.

Do any of the luma values matter here or shoud I ignore?

Are you talking about the luma settings? Yeah they are unused in LGC since we're only hooking chroma.

Got it

If you have ffmpeg compiled against libplacebo then you can run shaders in there if that's what you're talking about. The syntax is like ffmpeg -i your_input_file.ext -init_hw_device vulkan -vf hwupload,libplacebo=your_shader.glsl,hwdownload,format=yuv420p your_output_file.ext

I'll certainly try it soon, thanks. mpv is built around ffmpeg so any shader should just work hopefully. checked now the ffmpeg binary on my arch based distro and it doesn't have it, i'll have to compile it. Still useful for personal use.

Latest commit fixes EP for chroma. I only really tested to make sure luma EP is still the same. I briefly tried chroma EP but couldn't see a difference and I can't think of any test images that I have where chroma EP would be useful.

Nice, glad it was easy fix. I think that with chroma EP it should make a difference in stuff with weird color contrast: low bitrate animations, low bitrate+noisy hd video with overexposed bits, and poorly mastered hdr stuff which is the majority of streaming 4K rips. I'll test it and see if it makes a difference. // Now it seems that lgc has kinda high usage with higher res video, i have to tune some parameters down. Do you think lowering WD to 1 will worsen the quality too much?

BTW the duplicate 1st weight option seems interesting, what does it do?

AN3223 commented 1 year ago

I'll certainly try it soon, thanks. mpv is built around ffmpeg so any shader should just work hopefully. checked now the ffmpeg binary on my arch based distro and it doesn't have it, i'll have to compile it. Still useful for personal use.

Only shaders that work with gpu-next will work in ffmpeg's libplacebo filter, since gpu-next is a wrapper around libplacebo.

Nice, glad it was easy fix. I think that with chroma EP it should make a difference in stuff with weird color contrast: low bitrate animations, low bitrate+noisy hd video with overexposed bits, and poorly mastered hdr stuff which is the majority of streaming 4K rips. I'll test it and see if it makes a difference.

I'm not sure it would be useful for any of that. EP reduces denoising based on local brightness/darkness, so brighter/darker areas will receive less blur from NLM than areas in the middle. The idea behind it is noise can be slightly more difficult to perceive on brighter areas (while blur may be more easy to perceive), and noise might be rare on darker areas (either because of how the video was shot or how it was processed, I'm really not sure what causes that effect).

I think chroma EP might be useful for chroma speckles from shooting digital video in dim lighting. So you can crank up the denoising factor to blur the darker regions, but preserve sharpness of the brighter regions. This is a niche use case though, enabling chroma EP by default would clash with chroma denoising on JPEGs for instance.

Now it seems that lgc has kinda high usage with higher res video, i have to tune some parameters down. Do you think lowering WD to 1 will worsen the quality too much?

WD=1 is perfectly fine, it used to be the default for a while and it's still used in nlmeans_lq.glsl. It's slightly lower quality and it introduces some directional blur, but on chroma that's probably nearly imperceptible. I'll look into improving the performance of LGC, there are probably some corners that can be cut there.

BTW the duplicate 1st weight option seems interesting, what does it do?

It just fixes the issue I mentioned earlier where one of the chroma channels was being fully blurred, and it forces gather optimizations to be turned on. There's no way for NLM to automatically detect that it's using luma as a guide for chroma denoising, so that option needs to be there to make it aware.

AN3223 commented 1 year ago

I just pushed a commit with much leaner settings for LGC, the performance should be about on par with KrigBilateral now (slightly faster on my system) and the quality should actually be better than before (smaller research size focuses on more relevant pixels, and there was a typo so reflections weren't actually enabled before).

AN3223 commented 1 year ago

I just pushed some new commits:

There is now a guided_lgc.glsl that offers a similar effect as nlmeans_lgc.glsl!
guided_rf.glsl was renamed to guided.glsl and a silly mixup was fixed. So now the quality is much better than the self-guided implementation. It will likely replace the self-guided implementation that is injected into most of the NLM shaders now.
Made big changes to the tooling that builds these shaders which will enable cool stuff like customized guided filter settings for each NLM shader.

I'll clean up the docs and test out injecting the new guided.glsl, and then I will tackle some other minor issues, and then I will work on cool stuff. Thank you for getting me excited about my project again :)

Trinity3e commented 1 year ago

Yeah man, since they appeared I was looking every few days for updates :)

Was a bit busy, gotta catch up

There is now a guided_lgc.glsl that offers a similar effect as nlmeans_lgc.glsl!

Hehe, so guided RF for lgc too

guided_rf.glsl was renamed to guided.glsl and a silly mixup was fixed. So now the quality is much better than the self-guided implementation. It will likely replace the self-guided implementation that is injected into most of the NLM shaders now.

Made big changes to the tooling that builds these shaders which will enable cool stuff like customized guided filter settings for each NLM shader.

Neat, even better quality :) So if some shader variants will get customized guided RF (tuned for medium, temporal, lgc, etc.) then it can be tuned for less usage right? This way the guided RF can also be put into LQ. I tried last week a profile with guided RF but with LQ parameters and it was too heavy at 2160p video. Weirdly enough at 1080p it had, as you described, almost 0 extra usage.

I just pushed a commit with much leaner settings for LGC, the performance should be about on par with KrigBilateral now (slightly faster on my system) and the quality should actually be better than before (smaller research size focuses on more relevant pixels, and there was a typo so reflections weren't actually enabled before).

Yeah i noticed, i put reflections on at that time in my config, in my case to 1 (i think rotations/reflections put to maximum can burst usage in some scenes and cause 1-2 dropped frames, not sure in what context exactly) PS put to 5 should increase quality? How about going further higher, like RS 4 and PS 7, i have that in my LQ config for 2160p video

I think chroma EP might be useful for chroma speckles from shooting digital video in dim lighting. So you can crank up the denoising factor to blur the darker regions, but preserve sharpness of the brighter regions. This is a niche use case though, enabling chroma EP by default would clash with chroma denoising on JPEGs for instance.

So only for chroma noise (e.g. too high ISO) video it should work, and not necessarily for low bitrates? Still if it improves things slightly with no extra usage i'm happy Btw speaking of clashing, for now lgc is basically a second nlmeans stage on chroma, cause the normal nlmeans shader in front of it also does chroma. Does this reduce detail? I have denoise+sharp on in everything (luma and chroma for all variants). Maybe in the higher power profiles (medium, temporal, hq) adding lgc stright into the shader, is this a good idea?

LE: Excuse the weird formatting in the beginning, forgot github is picky when parsing :)

AN3223 commented 1 year ago

Neat, even better quality :) So if some shader variants will get customized guided RF (tuned for medium, temporal, lgc, etc.) then it can be tuned for less usage right? This way the guided RF can also be put into LQ. I tried last week a profile with guided RF but with LQ parameters and it was too heavy at 2160p video. Weirdly enough at 1080p it had, as you described, almost 0 extra usage.

The guided filter is pretty lightweight, I think the only place where I might opt for a fast guided filter would be LQ. For everything else I'll just be tuning the guided filter's settings to offer better quality for different use cases like anime/medium.

Yeah i noticed, i put reflections on at that time in my config, in my case to 1 (i think rotations/reflections put to maximum can burst usage in some scenes and cause 1-2 dropped frames, not sure in what context exactly)

I've seen hiccups before too, but I haven't gotten one in months. Maybe GPU driver related, or background programs inappropriately using GPU.

PS put to 5 should increase quality? How about going further higher, like RS 4 and PS 7, i have that in my LQ config for 2160p video

The PS/RS shapes aren't in any particular order (actually just the order they were implemented), higher numbers aren't necessarily better. Asymmetric shapes like 4 and 5 are lower quality compared to symmetric. The default PS=3 with P=3 is super efficient, PS=6 is optimized at any size, PS=3 or PS=0 at any size are probably the best if you have a very powerful GPU (or a very low resolution video). There's a comment near the top of the shaders titled "Regarding speed" that goes into a lot more detail on the optimizations.

So only for chroma noise (e.g. too high ISO) video it should work, and not necessarily for low bitrates?

Yeah it might be useful for that type of chroma noise. I don't think I really have anything that would help with low bitrate video.

Btw speaking of clashing, for now lgc is basically a second nlmeans stage on chroma, cause the normal nlmeans shader in front of it also does chroma. Does this reduce detail? I have denoise+sharp on in everything (luma and chroma for all variants). Maybe in the higher power profiles (medium, temporal, hq) adding lgc stright into the shader, is this a good idea?

Yes, both passes blur detail (this is the case for all denoising). I don't think LGC can be integrated into non-LGC shaders, but it's possible to remove the chroma hooks from the non-LGC shaders so LGC can do all of the chroma processing (you can do this yourself if you want, just delete every //!HOOK CHROMA line).

I tried injecting the new guided.glsl but I wasn't very sure about the quality. Even with a small downscaling factor it removes a lot of fine details without an obvious improvement in the denoising. I'll have to fix up nlmeans_test so I can benchmark it properly (the code is derelict and practically needs to be rewritten).

Trinity3e commented 1 year ago

The PS/RS shapes aren't in any particular order (actually just the order they were implemented), higher numbers aren't necessarily better. Asymmetric shapes like 4 and 5 are lower quality compared to symmetric. The default PS=3 with P=3 is super efficient, PS=6 is optimized at any size, PS=3 or PS=0 at any size are probably the best if you have a very powerful GPU (or a very low resolution video). There's a comment near the top of the shaders titled "Regarding speed" that goes into a lot more detail on the optimizations.

Yeah i noticed when playing with one setting that disabled gather optimizations, going higher would reduce usage, and when reenabling gather optimzations other values had lower usage etc. I thought that will help in filtering mpeg blocking artefacts, e.g. one much higher and one much lower, but that all got overlooked when temporal arrived, that deals with compression noise quite good :) So now i just set up those values based on the lowest usage (3/5 in medium and 4/7 in low that being)

Yeah it might be useful for that type of chroma noise. I don't think I really have anything that would help with low bitrate video.

For luma temporal does, noticeably so in my mpeg2/4 stuff. Heh, I can only imagine the immense usage that temporal lgc would have, that would be very impractical :D

Yes, both passes blur detail (this is the case for all denoising). I don't think LGC can be integrated into non-LGC shaders, but it's possible to remove the chroma hooks from the non-LGC shaders so LGC can do all of the chroma processing (you can do this yourself if you want, just delete every //!HOOK CHROMA line).

Okay, i will do that in my configs, at least in the temporal shader where lgc will always be used after it. In stuff where low is used i guess i will just reduce denoise factor slightly.

I tried injecting the new guided.glsl but I wasn't very sure about the quality. Even with a small downscaling factor it removes a lot of fine details without an obvious improvement in the denoising. I'll have to fix up nlmeans_test so I can benchmark it properly (the code is derelict and practically needs to be rewritten).

:) yeah glsl can get tangly, take it easy and looking forward to updating when you think it's ok!

If i may, you can declutter slightly the directory by removing the sharpen only shaders, and moving the make and config stuff to a sub directory, nothing will break with those right?

AN3223 commented 1 year ago

Yeah it might be useful for that type of chroma noise. I don't think I really have anything that would help with low bitrate video.

For luma temporal does, noticeably so in my mpeg2/4 stuff.

That's good to hear. I know temporal struggles with actually removing noise from low bitrate video since the noise often persists for more than a single frame, but for the same reasons it can also reinforce detail from previous frames, which can look good. I haven't really been able to perceive it in motion though, I have to compare while paused.

How well does temporal work on your system? Do you ever see any visual bugs like black lines or blobs?

Heh, I can only imagine the immense usage that temporal lgc would have, that would be very impractical :D

I don't imagine the quality would be very good but the performance should actually be about the same as regular temporal, since both are computing weights on the luma plane.

If i may, you can declutter slightly the directory by removing the sharpen only shaders, and moving the make and config stuff to a sub directory, nothing will break with those right?

Yeah for sure, that's a good idea. I don't want to remove all of the sharpen only shaders (I think they look really nice on very high resolution content) but maybe just the temporal/HQ variants.

Trinity3e commented 1 year ago

Yeah temporal works perfectly, no visual bugs, i use it in the gpu next profile for low res up to 720, passed into fsrcnnx_16. I tried a slimmed down variant of temporal for 1080 but weirdly it was too taxing, if you have an idea and want to do temporal_light, it would be interesting. The only downside i've seen is as you mentioned some motion blur/soap opera effect on fast moving stuff, but other than that it's okay. I discovered as you did doing only 2 frames instead of 3 reduces a biut the motion blur. ME=2 preserves more detail.

AN3223 commented 1 year ago

Yeah temporal works perfectly, no visual bugs

Awesome, I had some weird issues with it on my system before depending on the resolution of the video, I haven't tested it in a while though.

if you have an idea and want to do temporal_light, it would be interesting.

Sadly I don't have any ideas for speeding up temporal. Gather optimizations can only be applied to the current frame, since the previous frames are only accessible via buffers instead of textures.

The only downside i've seen is as you mentioned some motion blur/soap opera effect on fast moving stuff, but other than that it's okay. I discovered as you did doing only 2 frames instead of 3 reduces a biut the motion blur.

You could try something like T=3:SS=0.25:SD=vec3(1,1,1.5) or something like that, so the 3rd frame is still used but with a greatly reduced weight (although if performance is an issue then T=2 is probably better). That 3rd component of SD is good for controlling motion blur in general too.

ME=2 preserves more detail.

In my testing ME=2 is almost the same as ME=0. You might be seeing a sharper image from ME=2 as a result of getting lower weights on previous frames, which you could achieve with ME=1 if you tune SS/SD to your liking.

Trinity3e commented 1 year ago

Awesome, I had some weird issues with it on my system before depending on the resolution of the video, I haven't tested it in a while though.

:) Yeah the thing is gpu-next itself has a few problems right now, for instance i have it freezing in fullscreen once in a while, if you're not on amdgpu it should be worse i guess. With this occasion i'm looking in libplacebo a bit more, i didn't know gpu-next is simply that, appreciate it.

You could try something like T=3:SS=0.25:SD=vec3(1,1,1.5) or something like that, so the 3rd frame is still used but with a greatly reduced weight (although if performance is an issue then T=2 is probably better). That 3rd component of SD is good for controlling motion blur in general too.

Oh indeed, now the motion blur is less annoying on the eyes. Usage is only a few % more incl. at bursts so i'll keep it at T3 now. Pretty neat. I'll try to tune ME=1 as well later.

//// Btw yeah, the folder looks tidy now, just had a look through it again. :) I'll rebase soon. Some other of your configs i found useful in some parts with alacritty and ytdlp, I can also help you back with some of them (sans the wayland stuff i use pretty much what you use), just ask if you feel like.

AN3223 commented 1 year ago

:) Yeah the thing is gpu-next itself has a few problems right now, for instance i have it freezing in fullscreen once in a while, if you're not on amdgpu it should be worse i guess. With this occasion i'm looking in libplacebo a bit more, i didn't know gpu-next is simply that, appreciate it.

Huh, I'm on amdgpu with an RX 570.

I'll rebase soon.

If you're comfortable running make you can just edit the makefiles if you aren't already, it's much less work. Just cd into the dev directory, edit the makefile to your liking, and then run ./Makefile.nlm -B.

Some other of your configs i found useful in some parts with alacritty and ytdlp, I can also help you back with some of them (sans the wayland stuff i use pretty much what you use), just ask if you feel like.

Consider making your own dotfiles repo, and then I will check it out? :)

Trinity3e commented 1 year ago

Huh, I'm on amdgpu with an RX 570.

Very interesting, i was on an rx570 too, and now on a power limited rx5500xt (rx580 basically at 90w, expensive power in europe). I had to depart from my 570 due to it's memory starting to error out, it was a mining card before i got it :)

If you're comfortable running make you can just edit the makefiles if you aren't already, it's much less work. Just cd into the dev directory, edit the makefile to your liking, and then run ./Makefile.nlm -B.

It's very neat how you did it, nlmeans_cfg is awk script, makefile export variables to it, not sure if i use it wrong but it errors out at line 79 awk. Do I need some extra files apart from those in the new folder?

Consider making your own dotfiles repo, and then I will check it out? :)

Oh yeah :) I have some stuff on gitlab https://gitlab.com/Hitman_47 i guess I will begin uploading some of my other configs on Codeberg, this will allow me to also pull your commits directly from there (if i don't get along well with the makefile that is :-) )

AN3223 commented 1 year ago

Huh, I'm on amdgpu with an RX 570.

Very interesting, i was on an rx570 too, and now on a power limited rx5500xt (rx580 basically at 90w, expensive power in europe). I had to depart from my 570 due to it's memory starting to error out, it was a mining card before i got it :)

Huh, it's probably something else then, I imagine.

If you're comfortable running make you can just edit the makefiles if you aren't already, it's much less work. Just cd into the dev directory, edit the makefile to your liking, and then run ./Makefile.nlm -B.

It's very neat how you did it, nlmeans_cfg is awk script, makefile export variables to it, not sure if i use it wrong but it errors out at line 79 awk. Do I need some extra files apart from those in the new folder?

Gah, I just pushed some commits which should fix that now. I use busybox awk, apparently it allows length() on arrays but GNU awk does not? I tested GNU awk and it works fine now, except it really wanted to enable rotation on Anime HQ Medium for some reason. Rotation should be enabled on the anime profiles now anyway, so I just went ahead and enabled it. Let me know if you have any other problems!

Consider making your own dotfiles repo, and then I will check it out? :)

Oh yeah :) I have some stuff on gitlab https://gitlab.com/Hitman_47 i guess I will begin uploading some of my other configs on Codeberg, this will allow me to also pull your commits directly from there (if i don't get along well with the makefile that is :-) )

Oh, that's you! Hey! Lol.

Trinity3e commented 1 year ago

Gah, I just pushed some commits which should fix that now. I use busybox awk, apparently it allows length() on arrays but GNU awk does not? I tested GNU awk and it works fine now, except it really wanted to enable rotation on Anime HQ Medium for some reason. Rotation should be enabled on the anime profiles now anyway, so I just went ahead and enabled it. Let me know if you have any other problems!

Oh, didn't cross my mind this was about gawk stuff, i just hit a similar issue last week with sed/awk/findutils on a busybox distro trying to adapt my scripts to it :)

It's working now fine, i made this cfg for temporal to match what i had previously had (no other differences apart from the recent small changes) and the end result works OK. TEMPORAL_LUMA=T=3:WD=1:S=1.9:AS=1:ASF=0.5:ASP=1.4:RS=4:PS=6:ME=2:PD=1:$(NO_ROTATE):SD=vec3(1,1,1.5) All good so far, now a question, how can i adjust chroma parameters? Cause for the medium and low shaders they will have a functioning chroma component. LE: forgot, i tried to remove the lines with //!HOOK CHROMA from temporal and the shader crashes, weirdly.

Oh, that's you! Hey! Lol.

:-) i should've put the same name here as well, hehe. I guess i will sync gitlab with codeberg when i'll make it and create 'other dotfiles' separately, will look tomorrow, today too less time :) Take a look at my mpv config, maybe it'll help you. Tell me if i put something weirdly :D And in the shell scripts too if you know ba/z/sh a bit

AN3223 commented 1 year ago

It's working now fine, i made this cfg for temporal to match what i had previously had (no other differences apart from the recent small changes) and the end result works OK. TEMPORAL_LUMA=T=3:WD=1:S=1.9:AS=1:ASF=0.5:ASP=1.4:RS=4:PS=6:ME=2:PD=1:$(NO_ROTATE):SD=vec3(1,1,1.5)

Awesome!

All good so far, now a question, how can i adjust chroma parameters? Cause for the medium and low shaders they will have a functioning chroma component.

It's the NLM_CHROMA environment variable, it usually corresponds to the non-_LUMA macros.

LE: forgot, i tried to remove the lines with //!HOOK CHROMA from temporal and the shader crashes, weirdly.

I just tested and it seems to be working fine, try adding HOOKS=LUMA to the TEMPORAL_OPTS macro (it does the same thing but I'm guessing you missed a hook or deleted an extra one, this way is much easier).

Oh, that's you! Hey! Lol.

Take a look at my mpv config, maybe it'll help you. Tell me if i put something weirdly :D And in the shell scripts too if you know ba/z/sh a bit

Yeah sure, I'll take a look!

EDIT: Woah, Utilities.sh is huge! Lol.

Trinity3e commented 1 year ago

Oh yes, Utilities.sh is my baby. Have fun hehe, use and modify to your liking. I'll push a next minor version tomorrow where i bring some more Alpine package compatibility and finally bring the zsh version up to date, being too busy i didn't get to finish it :)

It's the NLM_CHROMA environment variable, it usually corresponds to the non-_LUMA macros.

You mean the NLM_CHROMA=$(MEDIUM) etc. parameter when calling awk, i modify that directly right?

I just tested and it seems to be working fine, try adding HOOKS=LUMA to the TEMPORAL_OPTS macro (it does the same thing but I'm guessing you missed a hook or deleted an extra one, this way is much easier).

Oh yeah, that worked, it's now temporal only luma and lgc chroma. I also see less usage in temporal luma, the params are all ok and the quality is the same, so it must do the same thing. What are the first two options under TEMPORAL_OPTS? The frame limit is the same as T? And the resolution specified is like a max/limit to what it can process?

LE: guided_s.glsl seems to be needed to compile most of the shaders, won't it make sense to move it too to ./dev, or you want them appear as an option for very LQ? Speaking of that i need to try it on my PMOS phone, the adreno gpu has less power than the amd rx, lol. I got however like 10fps with regular nlmeans last month, phone very hot of course :)

AN3223 commented 1 year ago

Oh yes, Utilities.sh is my baby. Have fun hehe, use and modify to your liking. I'll push a next minor version tomorrow where i bring some more Alpine package compatibility and finally bring the zsh version up to date, being too busy i didn't get to finish it :)

It's the NLM_CHROMA environment variable, it usually corresponds to the non-_LUMA macros.

You mean the NLM_CHROMA=$(MEDIUM) etc. parameter when calling awk, i modify that directly right?

You can either modify the macro referenced (MEDIUM in this case) or you can modify the NLM_CHROMA variable directly, depending on what shaders you want to be affected by your change (e.g., modifying MEDIUM would affect all shaders that utilize MEDIUM, modifying NLM_CHROMA will only affect that one shader).

I just tested and it seems to be working fine, try adding HOOKS=LUMA to the TEMPORAL_OPTS macro (it does the same thing but I'm guessing you missed a hook or deleted an extra one, this way is much easier).

Oh yeah, that worked, it's now temporal only luma and lgc chroma. I also see less usage in temporal luma, the params are all ok and the quality is the same, so it must do the same thing. What are the first two options under TEMPORAL_OPTS? The frame limit is the same as T? And the resolution specified is like a max/limit to what it can process?

Awesome, sounds like it's working. The frame limit is how many frames are stored (whereas T is how many frames are used) and the resolution is the resolution of only the stored frames. Even though it's 1920x1080 it can still handle higher resolutions, it's just that the weights for the previous frames will be computed at 1920x1080 (not properly downscaled either, I think it's like nearest neighbor but random lol), which is not ideal but I think the difference in quality is actually surprisingly small IIRC.

LE: guided_s.glsl seems to be needed to compile most of the shaders, won't it make sense to move it too to ./dev, or you want them appear as an option for very LQ?

Yeah that's the idea. I have more planned for them in the future (tuning them with the shader testing script I'm working on, implementing sharpening, maybe Weighted Guided Filter and other improvements too), so hopefully they will be a lot more useful.

Speaking of that i need to try it on my PMOS phone, the adreno gpu has less power than the amd rx, lol. I got however like 10fps with regular nlmeans last month, phone very hot of course :)

Wow it would be cool if you could let me know if rotations+reflections in LQ have any impact on a device like that. And I'm curious if any of the guided filters are actually usable on it.

Trinity3e commented 1 year ago

Got it, really appreciate it, i'll report on the rest tomorrow.

Wow it would be cool if you could let me know if rotations+reflections in LQ have any impact on a device like that. And I'm curious if any of the guided filters are actually usable on it.

Hehe, i'm testing right now, with a 480p video (720p+ doesn't work with shaders, probably freedreno bug), with just scale/cscale to spline64.

LQ works fine, usage total 15% with both research sizes to 5. The phone is cold to the touch :) Enabling 1 rotation/reflection increases slightly to 17%. 3/2 rotations however increases a lot to 32% and begins to drop frames

Guided has in total 20% with default params. Hmm kinda much. I'll see next with medium which uses guided

With medium what a surprise, it drops frames but only every few seconds. Guided filter 8% total, nlmeans 45% total. Phone gets warm but oh my the quality of the picture is good.

Even temporal works, no aberations on the image, but of course this time, as expected, 5-10 fps :) 80% total usage. Temporal: R=3 PS=7 seems to get the usage a lot down to 50% total (guided filter only 2%) and bearable frame rate, hmmm that's very interesting, it doesn't affect lq but affects temporal, probably hits a soft spot in the mobile gpu. And wow it looks even better on a small display.

AN3223 commented 1 year ago

Hehe, i'm testing right now, with a 480p video (720p+ doesn't work with shaders, probably freedreno bug), with just scale/cscale to spline64.

LQ works fine, usage total 15% with both research sizes to 5. The phone is cold to the touch :) Enabling 1 rotation/reflection increases slightly to 17%. 3/2 rotations however increases a lot to 32% and begins to drop frames

Guided has in total 20% with default params. Hmm kinda much. I'll see next with medium which uses guided

With medium what a surprise, it drops frames but only every few seconds. Guided filter 8% total, nlmeans 45% total. Phone gets warm but oh my the quality of the picture is good.

Even temporal works, no aberations on the image, but of course this time, as expected, 5-10 fps :) 80% total usage. Temporal: R=3 PS=7 seems to get the usage a lot down to 50% total (guided filter only 2%) and bearable frame rate, hmmm that's very interesting, it doesn't affect lq but affects temporal, probably hits a soft spot in the mobile gpu. And wow it looks even better on a small display.

Wow, thank you so much! I was considering re-enabling rotations+reflections on LQ since I can't measure any difference in performance on my system, but now I know that is not a good idea! It's good to hear temporal is behaving correctly too.

Guided has in total 20% with default params.

20% GPU usage I'm guessing? Is this with guided.glsl?

Trinity3e commented 1 year ago

I was kinda surprised that temporal works that well if i slim down those parameters; however on the big pc at high res source it doesn't seem to help, maybe it just reduces some usage "spikes". so it depends here. Enabling just 1rotation+1ref only increased usage by 2-3%, you can do that. I think that 1 reflection is basically free (not sure if while having 1 rotation already or just on texturegather alone)

20% GPU usage I'm guessing? Is this with guided.glsl?

The usage is from the mpv statistic, frame timings graph (i roughly added the subcomponents together). And yeah idk why guided alone has higher usage than LQ, but when used as a filter for a medium shader had a normal sub 5% usage.

You can either modify the macro referenced (MEDIUM in this case) or you can modify the NLM_CHROMA variable directly, depending on what shaders you want to be affected by your change (e.g., modifying MEDIUM would affect all shaders that utilize MEDIUM, modifying NLM_CHROMA will only affect that one shader).

Oh, now i got it, so it's the same format as the luma params, now they're modifying well :) I guess i'll rebase the 4 shaders in my config with the build system, certainly much less effort and it's well made, thanks a bunch.

Trinity3e commented 1 year ago

Hmm i've finished putting in the parameters i mostly use, made this diff, but when i use the makefile like this the shaders come out broken. Did i make a stupid syntax mistake somewhere?

(i've put them directly there at the bottom vars. to override some defaults and make it portable, that shouldn't hurt right?)

../nlmeans_lgc.glsl: nlmeans_template env NLM_DESC="$(LGC_DESC)" NLM_OPTS=$(LGC_OPTS) NLM_LUMA=$(LGC_LUMA):WD=1:S=1.5:WD=1:RI=1:RFI=1:$(NO_ROTATE) NLM_CHROMA=RF=SHARE_LUMA:D1W=1:WD=1:S=1.5:R=5:AS=1:ASF=0.2:ASP=1.4:RI=3:RFI=1:EP=1:BP=0.75:DP=0.25 NLM_FILE=$@ ./nlmeans_cfg < $? > $@ ../nlmeans_temporal.glsl: nlmeans_template env NLM_DESC="$(TEMPORAL_DESC)" NLM_FILE=$@ NLM_LUMA=T=3:WD=1:S=1.9:AS=1:ASF=0.5:ASP=1.4:RS=4:PS=6:PD=1:ME=2:$(NO_ROTATE):SD=vec3$1,1,1.5$ NLM_CHROMA=WD=1:S=1.9:AS=1:ASF=0.5:ASP=1.4:RS=4:PS=6:PD=1 NLM_OPTS=$(TEMPORAL_OPTS):HOOKS=LUMA ./nlmeans_cfg < $? > $@ ../nlmeans_lq.glsl: nlmeans_template env NLM_DESC="$(LQ_DESC)" NLM_FILE=$@ NLM_LUMA=S=0.9:P=1:R=5:AS=1:ASF=0.2:ASP=1.2:PS=7:RS=4:RI=1:RFI=1:WD=1:RF=1.25 NLM_CHROMA=S=1.1:P=1:R=5:AS=1:ASF=0.2:ASP=1.4:PS=5:RS=3:RI=1:RFI=1:WD=1 ./nlmeans_cfg < $? > $@ ../nlmeans_sharpen_denoise.glsl: nlmeans_template env NLM_DESC="$(SHARPEN_DENOISE_DESC)" NLM_FILE=$@ NLM_LUMA=AS=1:S=1.8:ASF=0.5:ASP=1.4:WD=1:PS=5:RS=4:RI=1:RFI=1:PD=1 NLM_CHROMA=AS=1:S=2.0:ASF=0.5:ASP=1.6:WD=1:PS=5:RS=3:RI=1:RFI=2:PD=1:EP=1:BP=0.75:DP=0.25 ./nlmeans_cfg < $? > $@

AN3223 commented 1 year ago

Hmm i've finished putting in the parameters i mostly use, made this diff, but when i use the makefile like this the shaders come out broken. Did i make a stupid syntax mistake somewhere?

(i've put them directly there at the bottom vars. to override some defaults and make it portable, that shouldn't hurt right?)

Try putting double quotes around the opts (i.e., NLM_LUMA="..."). I think the parenthesis from SD might be clashing with shell syntax. I'll update the makefiles to make this the default (EDIT: commit is pushed now).

20% GPU usage I'm guessing? Is this with guided.glsl?

The usage is from the mpv statistic, frame timings graph (i roughly added the subcomponents together). And yeah idk why guided alone has higher usage than LQ, but when used as a filter for a medium shader had a normal sub 5% usage.

Oh, I think that's the percentage of time spent on each shader while rendering a frame, e.g., NLM took up 30% percent of the time compared to all of the other shaders. I'm more interested in the average time taken to render a frame (bring up the regular info menu with I, look under Frame Timings > average > Fresh, i.e., the middle column of the first row, let it settle after loading the shader) as well as the pixel format and resolution of the video.

Trinity3e commented 1 year ago

Try putting double quotes around the opts (i.e., NLM_LUMA="..."). I think the parenthesis from SD might be clashing with shell syntax. I'll update the makefiles to make this the default (EDIT: commit is pushed now).

Yeah i tried, to no avail, the parantheses were escaped with \ but weirdly github parses that. It's still good to use double quotes in shell stuff, i see most makefiles make abuse of them :)

I eventually found out what makes some shaders broken, it's some stuff in the NLM_CHROMA variable, not sure what exactly, i'm still looking. As soon as i remove it, or put just the defaults, the shaders come out good again. Can't help you more with that unfortunately, awk was always weird to me, hehe.

Oh, I think that's the percentage of time spent on each shader while rendering a frame, e.g., NLM took up 30% percent of the time compared to all of the other shaders. I'm more interested in the average time taken to render a frame (bring up the regular info menu with I, look under Frame Timings > average > Fresh, i.e., the middle column of the first row, let it settle after loading the shader) as well as the pixel format and resolution of the video.

Okay, so the second column in timings, i'll check. The video tested was 4:2:2 640x480, any higher and mpv gets weirded out, i think freedreno crashes. Still it's very impressive that a 4+ year old mobile telephone gpu (sdm845) has roughly the performance of a lowish end pc video card.

AN3223 commented 1 year ago

Try putting double quotes around the opts (i.e., NLM_LUMA="..."). I think the parenthesis from SD might be clashing with shell syntax. I'll update the makefiles to make this the default (EDIT: commit is pushed now).

Yeah i tried, to no avail, the parantheses were escaped with \ but weirdly github parses that. It's still good to use double quotes in shell stuff, i see most makefiles make abuse of them :)

I eventually found out what makes some shaders broken, it's some stuff in the NLM_CHROMA variable, not sure what exactly, i'm still looking. As soon as i remove it, or put just the defaults, the shaders come out good again. Can't help you more with that unfortunately, awk was always weird to me, hehe.

If you can, reset the Makefile to the default and then just change one thing at a time and rebuild until it breaks, and then send me either a copy of the Makefile or just an excerpt of what you changed so I can try to reproduce. I would recommend using a code block instead of a spoiler, so the syntax doesn't get messed up.

BTW LGC pretty much ignores NLM_LUMA, so you don't need to modify that.

Trinity3e commented 1 year ago

Okay, I found where it breaks and sorted accordingly. I marked with '####', so after that pattern is the stuff that causes awk to produce broken result.

I'm not longer passing any luma opts to lgc as per your suggestion, nor any chroma opts to temporal. So now temporal works again - the problem is the same as with lgc with those same chroma parameters (RI, EP)

https://gist.githubusercontent.com/Trinity3e/ad44ef330b8fc695cf643d59c1cab7d5/raw/16a064b766200149a3210ded254b15e1f0848b1f/NLM%2520test%25203

AN3223 commented 1 year ago

Sorry, I'm confused, where does it start breaking and what shaders are affected? I notice after your first edit there is no longer any ####. Also what does this broken result look like? Do you get any error in the terminal when you run mpv?

Trinity3e commented 1 year ago

I went to sleep and by mistake i uploaded the working cfg, i'm sorry. Edited again now. mpv has this error: [vo/gpu-next/libplacebo] shader compile log (status=0): 0:80(9): preprocessor error: Redefinition of macro EP_raw The shaders simply don't load when one of them breaks, that confused me yesterday Diff between a working one and a broken one looks like this

AN3223 commented 1 year ago

Ah, it's binding EP twice. ~~You can just remove EP under NLM_CHROMA as a workaround for now while I'll work on a proper fix.~~ EDIT: Just pushed a commit that fixes this.

Trinity3e commented 1 year ago

Okay, appreciate it. In the meantime i'm looking at second column of mpv on the mobile telephone :) temporal with R=3 PS=7, ~14000 medium ~8000 (guided filter stages are all ~100) lq ~1500

AN3223 commented 1 year ago

Okay, appreciate it. In the meantime i'm looking at second column of mpv on the mobile telephone :) temporal with R=3 PS=7, ~14000 medium ~8000 (guided filter stages are all ~100) lq ~1500

How about guided.glsl, guided_s.glsl, and guided_s_fast.glsl on their own? I'm interested in how they compare to lq

Trinity3e commented 1 year ago

Guided is strange, it has the final big stage at 800, but if i am to count all the previous stages of ~100 (and the mean_a/mean_b stages of ~200 each), i can arrive at a total pretty much close to lq

Guided_s has the same usage, maybe very slightly lower

Guided_s_fast has 25% less usage

AN3223 commented 1 year ago

You don't have to add them up, we can just go off of the totals since the shader is the only thing changing. The totals are shown in the first menu when you hit I (above the resolution, below the Vsync Ratio and Vsync Jitter).

Trinity3e commented 1 year ago

Hmm, the totals variate a lot with the scaling and color processing (i think because of lack of bandwidth in mobile gpu), they can't be accurate unless i probably turn them off, i'll try but mpv really doesn't like that in anything non standard such as adreno/freedreno.

AN3223 commented 1 year ago

The average should be fairly stable though after a while of video playback, right?

Trinity3e commented 1 year ago

I'm afraid it doesn't after a couple minutes, it variates by +/-3000. On the big PC it's much more stable, within a few hundred

I see you pushed a fix, lemme try compilation of everything again

LE: Okay it seems it's working now! LE2: Oh yeah it's all good, i finally synced to my repo the change with temporal luma only + lgc and the guided optimizations. Thanks man.

AN3223 / dotfiles

Documentation on scaling factor #5