Closed SamuelLarkin closed 1 year ago
Hi Samuel,
Sockeye doesn't currently support FP16 inference on CPUs since PyTorch doesn't have CPU FP16 implementations of all the operators we use. For 16-bit CPU inference, you could try BF16: sockeye-translate --dtype bfloat16 ...
Best, Michael
Thanks for the pointer. Turns out I was using sockeye-3.1.27 which didn't have that option. It initially failed using pytorch-1.11.0
but I was successful doing sockeye-translate --use-cpu --dtype bfloat16
when I used pytorch-1.13.1
.
I opened a separate issue #1084.
Hi, I'm trying to quantize at inference time a float32 model into float16. It looks like pytorch doesn't support this or am I missing some environment variable that I need to set to enable this? I'm using
sockeye==3.1.27
. I also tried to use a float16 model (akasockeye-quantize --model model/params.best --config model/args.yaml --dtype float16
) and thensockeye-translate ... --use-cpu --dtype int8
and got the same error message. If I tried to translate using a float32 model and--dtype int8
, I get some translations. My goal here is to save the model into a smaller file and use it to translate on CPUs.Command
Conda Environment
conda env export
Error Message