Closed nemeer closed 2 years ago
@nemeer this is a PyTorch and probably general NN design choice, which is caused by the first call setting up a lot of things within the network like cache, memory on the GPU, graph optimization.
Thanks for clarifying @davidberenstein1957.
I am using minilm model with language 'en_core_websm'. While comparing the prediction time, i.e., predictor.predict(text)_, the prediction time for first hit is always a bit high than the following hits. Suppose after creating a predictor object, I call predict as follows:
predictor.predict(text)
---> first callpredictor.predict(text)
---> second callpredictor.predict(text)
---> third callTime taken for the first call is comparatively a bit higher(.2 sec) than the next prediction calls(.05 sec). Could you please help me understand why this initial hit takes a bit high prediction time?