Closed koachan closed 5 years ago
Can you rebase this PR on branch code_cleanup_reformat
?
Also, you could possibly detect it like jpeg-turbo does: https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/simd/CMakeLists.txt#L331
Okay, done. Am I doing this right? Also thanks for the libjpeg-turbo link!
Maybe only call include("CheckCSourceRuns")
inside if(PPCOPT)?
I never tried including later on .. but nothing in the docs speaks against it so..
EDIT: Rest looks fine.
It'll take a while for me to check this though, still have other PR's to go over and busy.
On the other hand, I can only verify it doesn't interfere on non PowerPC systems.. I have no PowerPC system myself to test on.
There's this guy from this issue: https://github.com/DeadSix27/waifu2x-converter-cpp/pull/120
He could possibly test it as 2nd-tester but I won't ping him until everything's ready to merge.
Maybe only call
include("CheckCSourceRuns")
inside if(PPCOPT)?I never tried including later on .. but nothing in the docs speaks against it so..
It works just fine here, so I'm moving it.
Also, I further optimized the AltiVec code.
Looks like I can still speed things up by using an array of v256_t
's instead of a plain vector
type as vreg_t
.
I'm very sorry for pushing yet another commit at this stage, but I just noticed that the very wide vectors I'm using causes filter_simd_impl0
to never get called, which hurts performance by a lot. This commit fixes the issue.
As for testing, ideally I'd like for someone else to test it before merging. You said before that the guy at #120 also uses a Power machine. Maybe we can try to contact him?
As for testing, ideally I'd like for someone else to test it before merging. You said before that the guy at #120 also uses a Power machine. Maybe we can try to contact him?
We can try:
@eclipseo If possible, are you be able to test this PR, it requires a Power machine and it looked like you may have access to one.
Confirmed that this works on POWER9 running a little endian distro. Since @koachan has a big endian G5, it should probably be pretty safe.
That said, runtime detection would be nice. One could use the auxv.h
header and the getauxval
function to do that. This is not entirely portable (limited to Linux glibc
and musl
) but it could at least be made an option.
One more thing; only the single implementation file for the altivec filter should be compiled with forced -maltivec
; if you force it for the rest, it may result in the compiler emitting altivec instructions for non-altivec code, which will defeat the point of doing any kind of runtime detection in the first place - so, the build system should not add -maltivec
for anything except modelHandler_altivec.cpp
, but modelHandler_altivec.cpp
definitely needs to have it in order for altivec to work at all.
edit: ah, i see that's already being done. Good then
This looks reasonable to me. I'd rebase the branch against master and squash the commits before attempting to merge this
Okay, added a runtime check with auxv.h
. It seems to work fine on my machine, could you test it again?
Works for me. You can also use PPC_FEATURE_HAS_ALTIVEC
instead of the 1U << 28
constant. They have the same value
Also, while cleaning up and squashing, I'd probably get rid of the unrelated whitespace changes which makes the majority of changes in the PR... leaving that for a separate commit would be best
Ahh, my bad. Seems like the editor stripped all the whitespaces. Lemme fix that.
Thanks for testing @q66, anything else you needed/wanted to change @koachan ? Otherwise I'ill merge it.
I can't think of anything else to change, so it should be okay to merge @DeadSix27.
Alright.
Add AltiVec support for faster processing in PowerPC and POWER processors. I don't know yet how to do runtime detection of it, so if you're using a CPU without AltiVec (such as G3, POWER5, and others), build it with -DPPCOPT=off to disable AltiVec.