When using CUDA or QLINEAR, weights and biases are duplicated when the translator object is cloned. This results in very high memory usage. This PR makes the weights and biases, which remain constant, be shared, and in so doing, keeps the memory usage for multiple translator objects down. The change here does this by using shared pointers on weight and bias for the Linear modules, and by using IDs for factory modules, making it easy to copy construct a new module factory.
When using CUDA or QLINEAR, weights and biases are duplicated when the translator object is cloned. This results in very high memory usage. This PR makes the weights and biases, which remain constant, be shared, and in so doing, keeps the memory usage for multiple translator objects down. The change here does this by using shared pointers on weight and bias for the Linear modules, and by using IDs for factory modules, making it easy to copy construct a new module factory.