BNL-DAQ-LDRD / NeuralCompression

3 stars 2 forks source link

Test inference performance optimization on CPU #11

Open blackcathj opened 2 years ago

YHRen commented 2 years ago

measure inference performance in the units of "instance / sec" in four settings

measurement should avoid initial memory allocation time. batch size will mater a lot for GPU. Maybe try two settings, 1 datapoint per batch, and as many as possible to fill the 48GB GPU memory.

@blackcathj can we group data together in a production system?

YHRen commented 2 years ago

cpu optimized for inference of sparse network, deepsparse future: