Let GPU sum forward the CPU implementation for now.
It's clearly suboptimal for nllLoss and sum to transfer between
CPU/GPU, but defining the ops in this way is good for getting things
to work.
I would argue that it's important to encapsulate transfer logic
within ops, to prevent transfer-related bugs. Future GPU
implementations of nllLoss and sum can then be drop-in
replacements.
The next blocker for MNIST CNN model is GPU elementwise op backpropagation
logic, which isn't implemented.
sum
forward the CPU implementation for now.nllLoss
andsum
to transfer between CPU/GPU, but defining the ops in this way is good for getting things to work.nllLoss
andsum
can then be drop-in replacements.The next blocker for MNIST CNN model is GPU elementwise op backpropagation logic, which isn't implemented.