Read Boian recent publication

Complete. paper

Notes:

extends NMF by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems.
algorithm is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory.
Memory complexity is reduced by batching/tiling and sparse and dense matrix operations are significantly accelerated with GPU cores
demonstrated good weak scaling on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when factorizing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density

bos-lab / ANTIVIRAL-TARGETS