Open AnGrypng opened 3 months ago
Hello!
Apologies for the delay, I've been recovering from a surgery this last month.
I'm not very familiar with gunicorn & FastAPI, so I'm not very sure how best to approach this. That said, I'm aware of the https://github.com/michaelfeil/infinity project which also uses gunicorn and FastAPI. It might either act as inspiration, or perhaps you can use it directly. I believe right now it's 3 projects in once (?), but the infinity_emb
here uses FastAPI and gunicorn.
Hey everybody,
I want to deploy a sentence encoding model using sentence-transformers. My code looks something like this:
i use gunicorn like this
The problem that I face is that even though I set gunicorn to two workers, it does not run this model in parallel or concurrently. I thought that by setting two workers fastapi copies application two times into memory and conducts two encodings at the same time. But this does not hold when I look at timings.
I don't think the problem is RAM or CPUs as I have many of them.
Perhaps you have some experience with deploying them to production. Perhaps I'm missing some parameters.
Would be very grateful in advance.