DeadSix27 / waifu2x-converter-cpp

Improved fork of Waifu2X C++ using OpenCL and OpenCV
MIT License
792 stars 86 forks source link

Add AltiVec Support #177

Closed koachan closed 5 years ago

koachan commented 5 years ago

Add AltiVec support for faster processing in PowerPC and POWER processors. I don't know yet how to do runtime detection of it, so if you're using a CPU without AltiVec (such as G3, POWER5, and others), build it with -DPPCOPT=off to disable AltiVec.

DeadSix27 commented 5 years ago

Can you rebase this PR on branch code_cleanup_reformat ?

Also, you could possibly detect it like jpeg-turbo does: https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/simd/CMakeLists.txt#L331

koachan commented 5 years ago

Okay, done. Am I doing this right? Also thanks for the libjpeg-turbo link!

DeadSix27 commented 5 years ago

Maybe only call include("CheckCSourceRuns") inside if(PPCOPT)?

I never tried including later on .. but nothing in the docs speaks against it so..

EDIT: Rest looks fine.

DeadSix27 commented 5 years ago

It'll take a while for me to check this though, still have other PR's to go over and busy.

On the other hand, I can only verify it doesn't interfere on non PowerPC systems.. I have no PowerPC system myself to test on.

There's this guy from this issue: https://github.com/DeadSix27/waifu2x-converter-cpp/pull/120

He could possibly test it as 2nd-tester but I won't ping him until everything's ready to merge.

koachan commented 5 years ago

Maybe only call include("CheckCSourceRuns") inside if(PPCOPT)?

I never tried including later on .. but nothing in the docs speaks against it so..

It works just fine here, so I'm moving it.


Also, I further optimized the AltiVec code. Looks like I can still speed things up by using an array of v256_t's instead of a plain vector type as vreg_t.

koachan commented 5 years ago

I'm very sorry for pushing yet another commit at this stage, but I just noticed that the very wide vectors I'm using causes filter_simd_impl0 to never get called, which hurts performance by a lot. This commit fixes the issue.

As for testing, ideally I'd like for someone else to test it before merging. You said before that the guy at #120 also uses a Power machine. Maybe we can try to contact him?

DeadSix27 commented 5 years ago

As for testing, ideally I'd like for someone else to test it before merging. You said before that the guy at #120 also uses a Power machine. Maybe we can try to contact him?

We can try:

@eclipseo If possible, are you be able to test this PR, it requires a Power machine and it looked like you may have access to one.

q66 commented 5 years ago

Confirmed that this works on POWER9 running a little endian distro. Since @koachan has a big endian G5, it should probably be pretty safe.

That said, runtime detection would be nice. One could use the auxv.h header and the getauxval function to do that. This is not entirely portable (limited to Linux glibc and musl) but it could at least be made an option.

q66 commented 5 years ago

One more thing; only the single implementation file for the altivec filter should be compiled with forced -maltivec; if you force it for the rest, it may result in the compiler emitting altivec instructions for non-altivec code, which will defeat the point of doing any kind of runtime detection in the first place - so, the build system should not add -maltivec for anything except modelHandler_altivec.cpp, but modelHandler_altivec.cpp definitely needs to have it in order for altivec to work at all.

edit: ah, i see that's already being done. Good then

q66 commented 5 years ago

This looks reasonable to me. I'd rebase the branch against master and squash the commits before attempting to merge this

koachan commented 5 years ago

Okay, added a runtime check with auxv.h. It seems to work fine on my machine, could you test it again?

q66 commented 5 years ago

Works for me. You can also use PPC_FEATURE_HAS_ALTIVEC instead of the 1U << 28 constant. They have the same value

q66 commented 5 years ago

Also, while cleaning up and squashing, I'd probably get rid of the unrelated whitespace changes which makes the majority of changes in the PR... leaving that for a separate commit would be best

koachan commented 5 years ago

Ahh, my bad. Seems like the editor stripped all the whitespaces. Lemme fix that.

DeadSix27 commented 5 years ago

Thanks for testing @q66, anything else you needed/wanted to change @koachan ? Otherwise I'ill merge it.

koachan commented 5 years ago

I can't think of anything else to change, so it should be okay to merge @DeadSix27.

DeadSix27 commented 5 years ago

Alright.