OpenNMT / CTranslate

Lightweight C++ translator for OpenNMT Torch models (deprecated)
https://opennmt.net/
MIT License
79 stars 50 forks source link

improve translator clone, eliminate redundant memory usage #60

Closed jhnwnd closed 4 years ago

jhnwnd commented 4 years ago

When using CUDA or QLINEAR, weights and biases are duplicated when the translator object is cloned. This results in very high memory usage. This PR makes the weights and biases, which remain constant, be shared, and in so doing, keeps the memory usage for multiple translator objects down. The change here does this by using shared pointers on weight and bias for the Linear modules, and by using IDs for factory modules, making it easy to copy construct a new module factory.