google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.24k stars 9.62k forks source link

Service request to BERT using flask #986

Open jaytimbadia opened 4 years ago

jaytimbadia commented 4 years ago

I am using flask to call BERT model to extract features for my sentences and prediction on top of that embeddings for simple classification. I am using threaded-True in flask but still for multi-user-request (say 10) its taking almost 10 minutes for 600 sentences on 8GB windows CPU. It can be flask server issue as well as it may or may not support multi threading. Is there any thing possible from BERT end to allow parallelism to make response faster on Windows CPU?

ivanrylov commented 4 years ago

Take a look at bert-as-service