Some of the TOP50 Supercomputers run OpenPOWER ISA Compatible CPUs (POWER9, etc) - Summit, et. al. Given that and my personal desire to run inference and training on my own OpenPOWER-based systems, it would be extremely useful to support using these massively multi-threaded CPUs (POWER9 has 24 cores w/ 4 threads per core, for example) with extremely high memory bandwidths (200 GB/s+ per socket) with NNPACK. In order to support this, Altivec compatible implementations of NNPACK algorithms would need to be added. A first step might be to implement the Intel-compatible intrinsic shims for SSE intrinsic primitives. I would be interested in doing this and then proceeding to full implementation - would you be willing to entertain accepting such additions into the project (assuming ppc support is also provided for the cpuinfo library per https://github.com/pytorch/cpuinfo/issues/2 )?
Also, I can provide ongoing test/development/continuous integration resources for NNPACK on several Raptor Computing Systems Talos II (IBM POWER9-based) systems I own and operate.
Some of the TOP50 Supercomputers run OpenPOWER ISA Compatible CPUs (POWER9, etc) - Summit, et. al. Given that and my personal desire to run inference and training on my own OpenPOWER-based systems, it would be extremely useful to support using these massively multi-threaded CPUs (POWER9 has 24 cores w/ 4 threads per core, for example) with extremely high memory bandwidths (200 GB/s+ per socket) with NNPACK. In order to support this, Altivec compatible implementations of NNPACK algorithms would need to be added. A first step might be to implement the Intel-compatible intrinsic shims for SSE intrinsic primitives. I would be interested in doing this and then proceeding to full implementation - would you be willing to entertain accepting such additions into the project (assuming ppc support is also provided for the cpuinfo library per https://github.com/pytorch/cpuinfo/issues/2 )?