Blinue / Magpie

An all-purpose window upscaler for Windows 10/11.
GNU General Public License v3.0
9.18k stars 485 forks source link

[Feature Request] ONNX support #772

Open FNsi opened 10 months ago

FNsi commented 10 months ago

In compact structure (model size 256k~4m) that would be a runtime effect base on DirectMl

Am I so greedy?😂

cqaqlxz commented 10 months ago

REAL-ESRGAN is to large, It's too difficult to run it in real time on current computers.

FNsi commented 10 months ago

REAL-ESRGAN is to large, It's too difficult to run it in real time on current computers.

The main bottleneck is memory size. 2k game + 2x enlarge cost about 16g memory. The speed can be real time in 3060 (512k model) only if the memory is unlimited. 😂

Imo that could work in igpu, though 780m still not good enough, maybe Qualcomm elite x, another story...

Blinue commented 10 months ago

Some models can indeed be inferenced in real time, such as mpv-upscale-2x_animejanai. I plan to add support for ONNX in the future, but there is still a lot of uncertainty.

FNsi commented 9 months ago

2x-DigitalFlim

The best ESR model I ever tried, not only the size but also the output(real + anime)

kato-megumi commented 9 months ago

The SuperUltraCompact model isn't much larger than Anime4k UL model (around 2x, I guess), so it's kinda possible to be ported to HLSL format.

Blinue commented 9 months ago

While porting to HLSL does indeed offer higher efficiency, the cost is also substantial unless there's an automated approach. I'm inclined to adopt ONNX Runtime, enabling us to seamlessly integrate any ONNX model with ease.

YingDoge commented 9 months ago

i personal think this is a great idea as animejanai does offer much better grafic some times. I would personal donate 20 US if this happen. Magie is getting better everyday. Love this thing so much.

While porting to HLSL does indeed offer higher efficiency, the cost is also substantial unless there's an automated approach. I'm inclined to adopt ONNX Runtime, enabling us to seamlessly integrate any ONNX model with ease.

kato-megumi commented 7 months ago

I ported Animejanai V3 SuperUltraCompact and 2x-DigitalFlim to Magpie's effect if anyone want to try. https://gist.github.com/kato-megumi/d10c12463b97184c559734f2cba553be

Gist
magpie effect
magpie effect. GitHub Gist: instantly share code, notes, and snippets.
Blinue commented 7 months ago

Great job! It appears that Animejanai is well-suited for scenes from old anime, as it doesn’t produce sharp lines like Anime4K does. However, a significant issue is that it sacrifices many details. DigitalFlim is sharper than Animejanai, it also suffers from severe detail loss. In terms of performance, they are roughly 20-25 times slower than Lanczos.

FNsi commented 7 months ago

nothing happened after I put both files in effects folder (even rebooted the system)

For experiment I also put the fakehdr.hlsl and it works...

Don't know if I made any mistakes (version 10.05)

kato-megumi commented 7 months ago

You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525

GitHub
refactor: XamlWindow 禁止子类直接访问成员 · Blinue/Magpie@e3dc41b
An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
FNsi commented 7 months ago

You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525

Thank u for your great work and help! Anyway I still don't know how to download the build from GitHub action, so let me keep that surprise till the next upcoming release.😁

However, a significant issue is that it sacrifices many details.

For that I think it's the common problem in ESR model, base on the structure (even large model can't keep many detail) and training datasets (animations?)

GitHub
refactor: XamlWindow 禁止子类直接访问成员 · Blinue/Magpie@e3dc41b
An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Blinue commented 7 months ago

Download from here: https://github.com/Blinue/Magpie/actions/runs/7911000525/artifacts/1246839355

FNsi commented 7 months ago

Download from here: https://github.com/Blinue/Magpie/actions/runs/7911000525/artifacts/1246839355

Thank u. After sigin again I can download it. It's wired that kind of page from action Need to be sign in (otherwise show 404) even I already signed in iOS client...

spiwar commented 7 months ago

You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525

GitHubrefactor: XamlWindow 禁止子类直接访问成员 · Blinue/Magpie@e3dc41bAn all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.

Can you port the SD model of animejanai, which is more aggressive in its detail reconstruction? an UC model for those of us with more computing power would also be great.

GitHub
refactor: XamlWindow 禁止子类直接访问成员 · Blinue/Magpie@e3dc41b
An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
kato-megumi commented 7 months ago

Can you port the SD model of animejanai

@spiwar Do you have link for it? Didn't find it on their github

FNsi commented 7 months ago

For detail restore...2x-Futsuu-Anime, but its 4M... i think its a game for 4090

kato-megumi commented 7 months ago

animejanai.zip Here is animejanai's Compact and UltraCompact for anyone with enough power. UltraCompact run like 3fps for 720p on my machine. Havent test Compact yet.

carycary246 commented 7 months ago

animejanai.zip Here is animejanai's Compact and UltraCompact for anyone with enough power. UltraCompact run like 3fps for 720p on my machine. Havent test Compact yet.

Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv. Can you port the v3 sharp model? They are in the animejanai discord beta releases.

kato-megumi commented 7 months ago

Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv

Perhaps it's a limitation of magpie/hlsl. I'm hopeful that integrating ONNX will enhance its performance. What GPU are you using?

Can you port the v3 sharp model?

Ok. https://gist.github.com/kato-megumi/d10c12463b97184c559734f2cba553be#file-animejanai_sharp_suc-hlsl

Gist
magpie effect
magpie effect. GitHub Gist: instantly share code, notes, and snippets.
spiwar commented 7 months ago

Can you port the SD model of animejanai

@spiwar Do you have link for it? Didn't find it on their github

You can find it in the full 1.1gb release, but i've included it here for convenience. 2x_AnimeJaNai_SD_V1beta34_Compact.zip

spiwar commented 7 months ago

image

RTX 3080ti, upscale from 1080p source to 4k C model runs at seconds per frame UC model runs at 2-3 fps SUC model runs at ~40fps

If we can optimize this to run at decent speeds then it would be very nice, UC and C model looks quite natural with no oversharpening.

Blinue commented 7 months ago

The performance optimization space is very limited, because the bottleneck is in floating-point operations.

@kato-megumi I found that 16-bit floating-point numbers (min16float) are more efficient, with about a 10% performance improvement on my side. But this is still not enough to make UC usable. Further performance improvement can only be achieved by using platform-specific APIs, such as TensorRT.

image

FNsi commented 7 months ago

find data to enhance SUC model might be the better way forward... Comparing with tensor rt, directml is a universal solution, imo... (But obviously cannot gain from nv hardware acceleration) Or PyTorch compile with 8bit?

kato-megumi commented 7 months ago

@Blinue Sorry, can you elaborate. I though using FORMAT R16G16B16A16_FLOAT already mean 16-bit floating-point number?

Blinue commented 7 months ago

I though using FORMAT R16G16B16A16_FLOAT already mean 16-bit floating-point number?

In hlsl, float is 32-bit, R16G16B16A16_FLOAT texture stores half-precision floating-point data, but it is converted to float when sampled. You have to explicitly cast to min16float to perform half-precision operations. See https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/using-hlsl-minimum-precision

Using HLSL minimum precision - Win32 apps
Starting with Windows 8, graphics drivers can implement minimum precision HLSL scalar data types by using any precision greater than or equal to their specified bit precision.
Blinue commented 7 months ago

Comparing with tensor rt, directml is a universal solution, imo... (But obviously cannot gain from nv hardware acceleration)

One advantage of ONNX Runtime is that it supports multiple backends, including DML and TensorRT. TensorRT is generally the fastest backend, it should be the preferred choice if available.

carycary246 commented 7 months ago

Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv

Perhaps it's a limitation of magpie/hlsl. I'm hopeful that integrating ONNX will enhance its performance. What GPU are you using?

tested on 3080 and a 4090, anything more than SUC is not useable for now with hlsl. We will definitely need onnx support so we can run with TensorRT.

kato-megumi commented 7 months ago

Maybe if ↓ got implement UC will be usable? https://github.com/Blinue/Magpie/discussions/610

GitHub
Applying Shaders Only to the Modified Parts of the Screen · Blinue Magpie · Discussion #610
When playing a visual novel, a significant portion of the screen remains static most of the time. Applying heavy effects to the entire screen feels inefficient and wasteful. Is it possible to apply...
Blinue commented 7 months ago

Maybe if ↓ got implement UC will be usable? https://github.com/Blinue/Magpie/discussions/610

bloc97 makes a good point, I think #610 is hard to achieve. On one hand, especially for complex scaling algorithms like convolutional networks, it is difficult to determine what effect a pixel change has on the output. On the other hand, duplicate frame detection is already implemented, which can effectively reduce power consumption in many situations. Going further and only updating the changed areas is not very useful, because it is hard to do and only works for certain scenarios.

GitHub
Applying Shaders Only to the Modified Parts of the Screen · Blinue Magpie · Discussion #610
When playing a visual novel, a significant portion of the screen remains static most of the time. Applying heavy effects to the entire screen feels inefficient and wasteful. Is it possible to apply...
FNsi commented 7 months ago

Meanwhile SUC will be the only option for realtime 3A game enhancement for years......

Blinue commented 7 months ago

I have done some initial tests, and DML can run the UC model at around 20 FPS, which is a tenfold improvement over hlsl. On faster devices, such as 4080/4090, using TensorRT could reach 60 FPS. I will start working on this feature after the 1.0 release.

Ptilopsis01 commented 7 months ago

新测试版本的性能监控是还没有适配么,无论是游戏内覆盖还是显示帧数都不能正常监控帧率 以及我有点怀疑是否真的使用上了,虽然画面确实看着差不多但是以我的显卡性能似乎有些太流畅了,鼠标一点没有延迟感的()使用的是tensorrt

顺便反馈一个bug,有时缩放激活的时候窗口没有自动置顶,而是显示在焦点窗口下面 @Blinue

Blinue commented 7 months ago

新版本使用了新的渲染系统 #643,在低帧率下也能保持鼠标流畅。如果默认的 AnimeJaNai 效果不明显,你可以试试不同的模型 https://github.com/hooke007/dotfiles/releases

顺便反馈一个bug,有时缩放激活的时候窗口没有自动置顶,而是显示在焦点窗口下面

可能有两个原因,源窗口本身是置顶的或者你打开了调试模式。

GitHub
Releases · hooke007/dotfiles
backups. Contribute to hooke007/dotfiles development by creating an account on GitHub.
Ptilopsis01 commented 7 months ago

新版本使用了新的渲染系统 #643,在低帧率下也能保持鼠标流畅。

明白了,这样使用体验确实好很多

可能有两个原因,源窗口本身是置顶的或者你打开了调试模式。

确实是,我开了调试模式

还有另一个小bug,新测试版本无法识别sharpen文件夹及其下的所有效果

Blinue commented 7 months ago

还有另一个小bug,新测试版本无法识别sharpen文件夹及其下的所有效果

这些还没适配新的渲染系统,耐心等待 #643 完成。

Ptilopsis01 commented 7 months ago

这些还没适配新的渲染系统,耐心等待 #643 完成。

OK了解了

kato-megumi commented 7 months ago

Upscale 1440p (or anything bigger than 1080p) with tensorrt result in black screen. magpie.2.log magpie.1.log magpie.log

Blinue commented 7 months ago

The reason is that the TensorRT engine in Magpie is built to handle up to 1080p input at most. It can technically support bigger inputs, but consumer-grade graphics cards may have difficulty with real-time inference.

spiwar commented 7 months ago

Just tested, very good performance upscaling from 1080p -> 4k with the included model (animejanai v3 ultracompact) RTX 3080ti Previous build with hlsl: 2fps~ DML: 22fps TensorRT: 34fps CUDA: doesn't work

Using superultracompact model gets me to 60fps on the same scenario, huge improvements all around.

This might be already in the works, but I think it's a good idea to have a pop up saying that the engine is being built when using TensorRT, as it happens in the background users might think nothing is happening when in fact the engine is being built.

Ptilopsis01 commented 6 months ago

DML: 22fps TensorRT: 34fps

Could you tell me how do you monitor your fps? The inner monitor can't work at present, and this version is not compatible with Rivatuner, which I guess also can not get the right fps data because of the new rendering system.

kato-megumi commented 6 months ago

turn on developer mode by edit config.json Setting > developer options > Duplicate frame detection to never Now you can get fps with Rivatuner. Remember dont move your mouse when try to get fps.

spiwar commented 6 months ago

DML: 22fps TensorRT: 34fps

Could you tell me how do you monitor your fps? The inner monitor can't work at present, and this version is not compatible with Rivatuner, which I guess also can not get the right fps data because of the new rendering system.

RTSS works for me. You might have to add magpie as a separate application in the rtss whitelist.

To monitor your fps with rtss, have an animation play or any moving scene and dont move your mouse.

Ptilopsis01 commented 6 months ago

Duplicate frame detection to never

Thanks!

Kamikadashi commented 6 months ago

The reason is that the TensorRT engine in Magpie is built to handle up to 1080p input at most. It can technically support bigger inputs, but consumer-grade graphics cards may have difficulty with real-time inference.

At the very least, this should be available as an option.

Blinue commented 6 months ago

At the very least, this should be available as an option.

I plan to enable the TensorRT backend to support inputs of any size in the future. This means that users will have to rebuild the engine multiple times to scale larger windows.

Kamikadashi commented 6 months ago

I plan to enable the TensorRT backend to support inputs of any size in the future. This means that users will have to rebuild the engine multiple times to scale larger windows.

Does this mean the engine would need to be rebuilt every time the window size changes, or would it need to be built just once for each different window size?

Blinue commented 6 months ago

Since building the engine is quite time-consuming, it’s crucial to minimize the frequency of rebuilds. The implementation details have not been decided yet, please be patient.

HIllya51 commented 6 months ago

Does the ONNX version not support Integrated graphics? The screen will go black on the AMD R6-6600H CPU...

Blinue commented 6 months ago

Kindly note that only DirectML backend is supported by non-NVIDIA graphics cards. Can you provide the logs to help diagnose the problem?