Offload matrix multiplication to the GPU

OpenNMT / CTranslate

Lightweight C++ translator for OpenNMT Torch models (deprecated)

https://opennmt.net/

MIT License

79 stars 50 forks source link

Offload matrix multiplication to the GPU #15

Closed guillaumekln closed 7 years ago

guillaumekln commented 7 years ago

This is an experimental change to offload matrix multiplication in Linear layers to the GPU.

This could be under-efficient for large batch sizes as the input and output have to be transfered between the host and the device but it yields a decent speed-up for non-batched requests (more experiments are required).