Open Matthieu-Tinycoaching opened 1 year ago
cc @larme
@Matthieu-Tinycoaching This seems like a memory allocation error. Do you serve the model on CPU or GPU?
Hi @larme I serve it on GPU. This seems weird since this model is lighter and faster than other models that run well in same conditions
Describe the bug
Hi,
While running locust tests (100 users with spawn=100) on the ONNX model of
cross-encoder/ms-marco-minilm-l-2-v2
, it failed near to the begining with the following message:To reproduce
No response
Expected behavior
No response
Environment
bentoml
: 1.0.7python
: 3.8.13platform
: Linux-5.4.0-65-generic-x86_64-with-glibc2.17uid:gid
: 1000:1000conda
: 22.9.0in_conda_env
: True