Closed jbeich closed 4 years ago
@pkubaj, can you test on FreeBSD powerpc64? This PR has been cherry-picked downstream, so you can do the following:
$ cd /usr/ports/graphics/waifu2x-converter-cpp
$ make install
$ fetch https://www.freebsd.org/layout/images/beastie.png
$ waifu2x-converter-cpp -i beastie.png -o beastie_2x.png
I don't think it worked, unless CPU: Generic
is ok:
CPU: Generic
Processing file [1/1] "/usr/home/pkubaj/beastie.png":
Scaling image from 178x196 to 356x392
Step 01/02: Denoising
Proccessing [1/1] slices
Processing block, column (01/01), row (01/01) ...
total : 1.712[sec], 0013.66[GFLOPS]
Step 02/02: 2x Scaling
Proccessing [1/1] slices
2x Scaling:
Processing block, column (01/01), row (01/01) ...
total : 5.970[sec], 0014.59[GFLOPS]
Writing image to file...
Done, took: 0s total, file: 8.170s avg: 12.066s
Finished processing 1 files
Took: 12.066secs total, filter: 7.682secs; 0 files skipped, 0 files errored. [GFLOPS: 9.16, GFLOPS-Filter: 14.39]
@pkubaj, can you try again?
Feature detection seems okay on Debian ppc64.
CPU: PowerPC AltiVec
Processing file [1/1] "/tmp/rin.jpg":
Scaling image from 404x600 to 808x1200
Step 01/02: Denoising
Proccessing [1/1] slices
Processing block, column (01/01), row (01/02) ...
total : 14.802[sec], 0008.39[GFLOPS]
Processing block, column (01/01), row (02/02) ...
total : 2.819[sec], 0009.98[GFLOPS]
Step 02/02: 2x Scaling
Proccessing [1/1] slices
2x Scaling:
Processing block, column (01/02), row (01/03) ...
total : 18.968[sec], 0008.02[GFLOPS]
Processing block, column (02/02), row (01/03) ...
total : 9.453[sec], 0010.18[GFLOPS]
Processing block, column (01/02), row (02/03) ...
total : 14.908[sec], 0010.20[GFLOPS]
Processing block, column (02/02), row (02/03) ...
total : 9.429[sec], 0010.21[GFLOPS]
Processing block, column (01/02), row (03/03) ...
total : 7.372[sec], 0008.78[GFLOPS]
Processing block, column (02/02), row (03/03) ...
total : 6.187[sec], 0006.62[GFLOPS]
Writing image to file...
Done, took: 0s total, file: 84.505s avg: 88.205s
Finished processing 1 files
Took: 88.205secs total, filter: 83.938secs; 0 files skipped, 0 files errored. [GFLOPS: 8.55, GFLOPS-Filter: 8.99]```
Yes, it now works (and is much faster than before):
CPU: PowerPC AltiVec
Processing file [1/1] "/usr/home/pkubaj/beastie.png":
Scaling image from 178x196 to 356x392
Step 01/02: Denoising
Proccessing [1/1] slices
Processing block, column (01/01), row (01/01) ...
total : 0.289[sec], 0080.90[GFLOPS]
Step 02/02: 2x Scaling
Proccessing [1/1] slices
2x Scaling:
Processing block, column (01/01), row (01/01) ...
total : 0.785[sec], 0111.01[GFLOPS]
Writing image to file...
Done, took: 0s total, file: 1.771s avg: 6.106s
Finished processing 1 files
Took: 6.106secs total, filter: 1.074secs; 0 files skipped, 0 files errored. [GFLOPS: 18.10, GFLOPS-Filter: 102.91]
Tested on FreeBSD aarch64, armv6.
getauxval
copycat since https://github.com/freebsd/freebsd/commit/990c7fb0441b__has_include
, so simplify<sys/auxv.h>
detectioncheck_c_source_runs
to simplify and support cross-compilationgetauxval
always returnsunsigned long
, even on ARMHWCAP_ALTIVEC
to a proper macro for consistency with ARM codeHWCAP_NEON
but notHWCAP_ARM_NEON