measure inference performance in the units of "instance / sec" in four settings
[ ] python native gpu,
[ ] python native cpu,
[ ] torchsript cpp cpu
[ ] torchsript cpp gpu
measurement should avoid initial memory allocation time. batch size will mater a lot for GPU. Maybe try two settings, 1 datapoint per batch, and as many as possible to fill the 48GB GPU memory.
@blackcathj can we group data together in a production system?
measure inference performance in the units of "instance / sec" in four settings
python native gpu
,python native cpu
,torchsript cpp cpu
torchsript cpp gpu
measurement should avoid initial memory allocation time. batch size will mater a lot for GPU. Maybe try two settings, 1 datapoint per batch, and as many as possible to fill the 48GB GPU memory.
@blackcathj can we group data together in a production system?