Comparatively high initial prediction time for first predict() hit

nemeer commented 2 years ago

I am using minilm model with language 'en_core_websm'. While comparing the prediction time, i.e., predictor.predict(text)_, the prediction time for first hit is always a bit high than the following hits. Suppose after creating a predictor object, I call predict as follows:

predictor.predict(text) ---> first call predictor.predict(text) ---> second call predictor.predict(text) ---> third call

Time taken for the first call is comparatively a bit higher(.2 sec) than the next prediction calls(.05 sec). Could you please help me understand why this initial hit takes a bit high prediction time?

davidberenstein1957 commented 2 years ago

@nemeer this is a PyTorch and probably general NN design choice, which is caused by the first call setting up a lot of things within the network like cache, memory on the GPU, graph optimization.

https://datascience.stackexchange.com/questions/63476/why-the-first-prediction-of-neural-network-in-pytorch-is-slower-than-following-p

nemeer commented 2 years ago

Thanks for clarifying @davidberenstein1957.

davidberenstein1957 / crosslingual-coreference

Comparatively high initial prediction time for first predict() hit #15