Closed cedws closed 6 years ago
I'm testing on X11 and managed to get out:
Scale above 4 not support currently, defaults to 4
Hi,
If you set scaling to 4 do you still see no effect?
@emoon There's a difference between 2 and 4. Those work fine. As the info message suggests, scale factors above 4 don't work/are not supported. Can we do anything about this?
Yeah so the problem is that X11 doesn't have any support for scaling by itself which macOS and Windows has so I implemented them manually like this
https://github.com/emoon/rust_minifb/blob/master/src/native/x11/X11MiniFB.c#L490
It would easy to implement x8
or more here. I have manually unrolled it to keep it as fast as possible but the best option would likely to use SIMD here to have good performance.
An other alternative would be to have optional support for OpenGL and have the GPU do that work but I haven't looked into that.
I have fixed this now. 0.10.5 has been released with this added.
@emoon Just gave this a try, X8 seems to be working, but X16 and X32 are still unsupported. Any plans for this in the future?
I actually didn't realize I support up to X32. I can try to add support for it. You must have a pretty small window and a fairly large screen res if you need that though :)
I would quite like to use X16, as I'm building a CHIP8 emulator which has a 64x32 resolution. Thus, my window is currently only 512x256. I could probably alter my emulator to duplicate a pixel value to do some scaling myself but it would be nice to have in minifb
. Please do if you have the time, thank you :+1:
Alright. I will try to get it sorted during the weekend :)
@emoon I've just noticed some performance differences between scale factors. You said that you were having to implement this scaling manually as X11 doesn't support it, so I assume this what's causing it. Is it possible to optimise this function in any way? Maybe a memcpy
? Thanks.
The problem is that you can't do a memcpy
because you need to "expand" a single pixel to multiple pixels so it means that you need to write it to the same buffer several times. Like this https://github.com/emoon/rust_minifb/blob/master/src/native/x11/X11MiniFB.c#L490
When scaling up there will of course be more pixels to write making it taking CPU time. In order to reduce the time I can look at the generated assembly and make sure it's generated correct, it's also possible to use SSE on x86 machines but I don't think it will help much as you are almost only bound writing to memory here.
In terms of perf it should be faster to use the release version. Another thing you can do is to run your update in a thread so it runs at the same time as the update_with_buffer
call is running.
There is a another solution to this and it's to use OpenGL to do the update instead. This has several issues such as requiring OpenGL which is a big downside as it isn't needed today. I'm not sure if it's possible to only do it optional but it's quite a bit of work to get there.
I have investigated this now. For x8
the compiler generates this code
.L131:
add rsi, 4
movd xmm1, DWORD PTR -4[rsi]
mov r15d, ecx
add ecx, r8d
pshufd xmm0, xmm1, 0
movups XMMWORD PTR [rdi], xmm0
movups XMMWORD PTR [rax], xmm0
movups XMMWORD PTR [rdi+r14], xmm0
movups XMMWORD PTR [rax+r14], xmm0
movups XMMWORD PTR [rdi+r13], xmm0
movups XMMWORD PTR [rax+r13], xmm0
movups XMMWORD PTR [rdi+r12], xmm0
movups XMMWORD PTR [rax+r12], xmm0
movups XMMWORD PTR [rdi+rbp], xmm0
movups XMMWORD PTR [rax+rbp], xmm0
movups XMMWORD PTR [rdi+r10], xmm0
movups XMMWORD PTR [rax+r10], xmm0
movups XMMWORD PTR [rdi+r9], xmm0
movups XMMWORD PTR [rax+r9], xmm0
movups XMMWORD PTR [rdi+r11], xmm0
add rdi, rbx
movups XMMWORD PTR [rax+r11], xmm0
add rax, rbx
cmp edx, r15d
jg .L131
add DWORD PTR -24[rsp], r8d
add rdi, QWORD PTR -16[rsp]
mov eax, DWORD PTR -24[rsp]
cmp DWORD PTR -20[rsp], eax
jg .L130
Which is pretty good as I write u32
values but it figures out to use SSE instead.
On my machine when running a 64x32
buffer and x8
scaling the update_with_buffer
code takes 0.18-0.2 ms and I think it's not that easy to improve much more as the compile does a fairly good job here.
I have added x16 and x32 now in 0.10.6
(x32 doesn't seem to behave correct for me but you can try it out, x16 works fine)
They work brilliantly, thank you very much for your hard work :+1:. As for the performance issue, I'm not really sure what the problem is. I am confident that it's due to scaling, as the difference in render speed between X2 and X16 is significant. As you say, I don't think there's much that can be done there though. It's just a little hobby project anyway, performance isn't a big issue. Thanks again!
Cool np :) Yeah. As I do the scaling on the CPU there is not much to do to get it faster in higher res this is really a task for the GPU but (like I wrote above) I would need to depend on HW OpenGL acceleration which may or may not be present in the system.
Hi. Currently on an X11 system but I'll try to get more details for you later.
The
minifb::Scale
option appears to have no effect on Wayland. I also seem to remember seeing an info message saying that onlyminifb::Scale::X8
and lower is supported. The window starts extremely small.I believe the system I had the issue on was running Fedora 27, which uses XWayland.