We want to always allocate tokenization input using binary backend, because it's zero copy, and there is no reason to involve XLA too early.
A new :preallocate_params option that moves params to the device as defined by :defn_options. This can be useful with multiple GPUs, where we could load params into CPU and then use :preallocate_params so each serving partition allocates params on the corresponding device.
Closes #217.
We want to always allocate tokenization input using binary backend, because it's zero copy, and there is no reason to involve XLA too early.
A new
:preallocate_params
option that moves params to the device as defined by:defn_options
. This can be useful with multiple GPUs, where we could load params into CPU and then use:preallocate_params
so each serving partition allocates params on the corresponding device.