Open FNsi opened 10 months ago
REAL-ESRGAN is to large, It's too difficult to run it in real time on current computers.
REAL-ESRGAN is to large, It's too difficult to run it in real time on current computers.
The main bottleneck is memory size. 2k game + 2x enlarge cost about 16g memory. The speed can be real time in 3060 (512k model) only if the memory is unlimited. 😂
Imo that could work in igpu, though 780m still not good enough, maybe Qualcomm elite x, another story...
Some models can indeed be inferenced in real time, such as mpv-upscale-2x_animejanai. I plan to add support for ONNX in the future, but there is still a lot of uncertainty.
The best ESR model I ever tried, not only the size but also the output(real + anime)
The SuperUltraCompact model isn't much larger than Anime4k UL model (around 2x, I guess), so it's kinda possible to be ported to HLSL format.
While porting to HLSL does indeed offer higher efficiency, the cost is also substantial unless there's an automated approach. I'm inclined to adopt ONNX Runtime, enabling us to seamlessly integrate any ONNX model with ease.
i personal think this is a great idea as animejanai does offer much better grafic some times. I would personal donate 20 US if this happen. Magie is getting better everyday. Love this thing so much.
While porting to HLSL does indeed offer higher efficiency, the cost is also substantial unless there's an automated approach. I'm inclined to adopt ONNX Runtime, enabling us to seamlessly integrate any ONNX model with ease.
I ported Animejanai V3 SuperUltraCompact and 2x-DigitalFlim to Magpie's effect if anyone want to try. https://gist.github.com/kato-megumi/d10c12463b97184c559734f2cba553be
Gistmagpie effect. GitHub Gist: instantly share code, notes, and snippets.
Great job! It appears that Animejanai is well-suited for scenes from old anime, as it doesn’t produce sharp lines like Anime4K does. However, a significant issue is that it sacrifices many details. DigitalFlim is sharper than Animejanai, it also suffers from severe detail loss. In terms of performance, they are roughly 20-25 times slower than Lanczos.
nothing happened after I put both files in effects folder (even rebooted the system)
For experiment I also put the fakehdr.hlsl and it works...
Don't know if I made any mistakes (version 10.05)
You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525
GitHubAn all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525
Thank u for your great work and help! Anyway I still don't know how to download the build from GitHub action, so let me keep that surprise till the next upcoming release.😁
However, a significant issue is that it sacrifices many details.
For that I think it's the common problem in ESR model, base on the structure (even large model can't keep many detail) and training datasets (animations?)
GitHubAn all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Download from here: https://github.com/Blinue/Magpie/actions/runs/7911000525/artifacts/1246839355
Download from here: https://github.com/Blinue/Magpie/actions/runs/7911000525/artifacts/1246839355
Thank u. After sigin again I can download it. It's wired that kind of page from action Need to be sign in (otherwise show 404) even I already signed in iOS client...
You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525
GitHubrefactor: XamlWindow 禁止子类直接访问成员 · Blinue/Magpie@e3dc41bAn all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Can you port the SD model of animejanai, which is more aggressive in its detail reconstruction? an UC model for those of us with more computing power would also be great.
GitHubAn all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Can you port the SD model of animejanai
@spiwar Do you have link for it? Didn't find it on their github
For detail restore...2x-Futsuu-Anime, but its 4M... i think its a game for 4090
animejanai.zip Here is animejanai's Compact and UltraCompact for anyone with enough power. UltraCompact run like 3fps for 720p on my machine. Havent test Compact yet.
animejanai.zip Here is animejanai's Compact and UltraCompact for anyone with enough power. UltraCompact run like 3fps for 720p on my machine. Havent test Compact yet.
Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv. Can you port the v3 sharp model? They are in the animejanai discord beta releases.
Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv
Perhaps it's a limitation of magpie/hlsl. I'm hopeful that integrating ONNX will enhance its performance. What GPU are you using?
Can you port the v3 sharp model?
Ok. https://gist.github.com/kato-megumi/d10c12463b97184c559734f2cba553be#file-animejanai_sharp_suc-hlsl
Gistmagpie effect. GitHub Gist: instantly share code, notes, and snippets.
Can you port the SD model of animejanai
@spiwar Do you have link for it? Didn't find it on their github
You can find it in the full 1.1gb release, but i've included it here for convenience. 2x_AnimeJaNai_SD_V1beta34_Compact.zip
RTX 3080ti, upscale from 1080p source to 4k C model runs at seconds per frame UC model runs at 2-3 fps SUC model runs at ~40fps
If we can optimize this to run at decent speeds then it would be very nice, UC and C model looks quite natural with no oversharpening.
The performance optimization space is very limited, because the bottleneck is in floating-point operations.
@kato-megumi I found that 16-bit floating-point numbers (min16float) are more efficient, with about a 10% performance improvement on my side. But this is still not enough to make UC usable. Further performance improvement can only be achieved by using platform-specific APIs, such as TensorRT.
find data to enhance SUC model might be the better way forward... Comparing with tensor rt, directml is a universal solution, imo... (But obviously cannot gain from nv hardware acceleration) Or PyTorch compile with 8bit?
@Blinue Sorry, can you elaborate. I though using FORMAT R16G16B16A16_FLOAT
already mean 16-bit floating-point number?
I though using FORMAT R16G16B16A16_FLOAT already mean 16-bit floating-point number?
In hlsl, float is 32-bit, R16G16B16A16_FLOAT texture stores half-precision floating-point data, but it is converted to float when sampled. You have to explicitly cast to min16float to perform half-precision operations. See https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/using-hlsl-minimum-precision
Starting with Windows 8, graphics drivers can implement minimum precision HLSL scalar data types by using any precision greater than or equal to their specified bit precision.
Comparing with tensor rt, directml is a universal solution, imo... (But obviously cannot gain from nv hardware acceleration)
One advantage of ONNX Runtime is that it supports multiple backends, including DML and TensorRT. TensorRT is generally the fastest backend, it should be the preferred choice if available.
Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv
Perhaps it's a limitation of magpie/hlsl. I'm hopeful that integrating ONNX will enhance its performance. What GPU are you using?
tested on 3080 and a 4090, anything more than SUC is not useable for now with hlsl. We will definitely need onnx support so we can run with TensorRT.
Maybe if ↓ got implement UC will be usable? https://github.com/Blinue/Magpie/discussions/610
GitHubWhen playing a visual novel, a significant portion of the screen remains static most of the time. Applying heavy effects to the entire screen feels inefficient and wasteful. Is it possible to apply...
Maybe if ↓ got implement UC will be usable? https://github.com/Blinue/Magpie/discussions/610
bloc97 makes a good point, I think #610 is hard to achieve. On one hand, especially for complex scaling algorithms like convolutional networks, it is difficult to determine what effect a pixel change has on the output. On the other hand, duplicate frame detection is already implemented, which can effectively reduce power consumption in many situations. Going further and only updating the changed areas is not very useful, because it is hard to do and only works for certain scenarios.
GitHubWhen playing a visual novel, a significant portion of the screen remains static most of the time. Applying heavy effects to the entire screen feels inefficient and wasteful. Is it possible to apply...
Meanwhile SUC will be the only option for realtime 3A game enhancement for years......
I have done some initial tests, and DML can run the UC model at around 20 FPS, which is a tenfold improvement over hlsl. On faster devices, such as 4080/4090, using TensorRT could reach 60 FPS. I will start working on this feature after the 1.0 release.
新测试版本的性能监控是还没有适配么,无论是游戏内覆盖还是显示帧数都不能正常监控帧率 以及我有点怀疑是否真的使用上了,虽然画面确实看着差不多但是以我的显卡性能似乎有些太流畅了,鼠标一点没有延迟感的()使用的是tensorrt
顺便反馈一个bug,有时缩放激活的时候窗口没有自动置顶,而是显示在焦点窗口下面 @Blinue
新版本使用了新的渲染系统 #643,在低帧率下也能保持鼠标流畅。如果默认的 AnimeJaNai 效果不明显,你可以试试不同的模型 https://github.com/hooke007/dotfiles/releases
顺便反馈一个bug,有时缩放激活的时候窗口没有自动置顶,而是显示在焦点窗口下面
可能有两个原因,源窗口本身是置顶的或者你打开了调试模式。
GitHubbackups. Contribute to hooke007/dotfiles development by creating an account on GitHub.
新版本使用了新的渲染系统 #643,在低帧率下也能保持鼠标流畅。
明白了,这样使用体验确实好很多
可能有两个原因,源窗口本身是置顶的或者你打开了调试模式。
确实是,我开了调试模式
还有另一个小bug,新测试版本无法识别sharpen文件夹及其下的所有效果
还有另一个小bug,新测试版本无法识别sharpen文件夹及其下的所有效果
这些还没适配新的渲染系统,耐心等待 #643 完成。
这些还没适配新的渲染系统,耐心等待 #643 完成。
OK了解了
Upscale 1440p (or anything bigger than 1080p) with tensorrt result in black screen. magpie.2.log magpie.1.log magpie.log
The reason is that the TensorRT engine in Magpie is built to handle up to 1080p input at most. It can technically support bigger inputs, but consumer-grade graphics cards may have difficulty with real-time inference.
Just tested, very good performance upscaling from 1080p -> 4k with the included model (animejanai v3 ultracompact) RTX 3080ti Previous build with hlsl: 2fps~ DML: 22fps TensorRT: 34fps CUDA: doesn't work
Using superultracompact model gets me to 60fps on the same scenario, huge improvements all around.
This might be already in the works, but I think it's a good idea to have a pop up saying that the engine is being built when using TensorRT, as it happens in the background users might think nothing is happening when in fact the engine is being built.
DML: 22fps TensorRT: 34fps
Could you tell me how do you monitor your fps? The inner monitor can't work at present, and this version is not compatible with Rivatuner, which I guess also can not get the right fps data because of the new rendering system.
turn on developer mode by edit config.json Setting > developer options > Duplicate frame detection to never Now you can get fps with Rivatuner. Remember dont move your mouse when try to get fps.
DML: 22fps TensorRT: 34fps
Could you tell me how do you monitor your fps? The inner monitor can't work at present, and this version is not compatible with Rivatuner, which I guess also can not get the right fps data because of the new rendering system.
RTSS works for me. You might have to add magpie as a separate application in the rtss whitelist.
To monitor your fps with rtss, have an animation play or any moving scene and dont move your mouse.
Duplicate frame detection to never
Thanks!
The reason is that the TensorRT engine in Magpie is built to handle up to 1080p input at most. It can technically support bigger inputs, but consumer-grade graphics cards may have difficulty with real-time inference.
At the very least, this should be available as an option.
At the very least, this should be available as an option.
I plan to enable the TensorRT backend to support inputs of any size in the future. This means that users will have to rebuild the engine multiple times to scale larger windows.
I plan to enable the TensorRT backend to support inputs of any size in the future. This means that users will have to rebuild the engine multiple times to scale larger windows.
Does this mean the engine would need to be rebuilt every time the window size changes, or would it need to be built just once for each different window size?
Since building the engine is quite time-consuming, it’s crucial to minimize the frequency of rebuilds. The implementation details have not been decided yet, please be patient.
Does the ONNX version not support Integrated graphics? The screen will go black on the AMD R6-6600H CPU...
Kindly note that only DirectML backend is supported by non-NVIDIA graphics cards. Can you provide the logs to help diagnose the problem?
In compact structure (model size 256k~4m) that would be a runtime effect base on DirectMl
Am I so greedy?😂