I am using flask to call BERT model to extract features for my sentences and prediction on top of that embeddings for simple classification. I am using threaded-True in flask but still for multi-user-request (say 10) its taking almost 10 minutes for 600 sentences on 8GB windows CPU. It can be flask server issue as well as it may or may not support multi threading.
Is there any thing possible from BERT end to allow parallelism to make response faster on Windows CPU?
I am using flask to call BERT model to extract features for my sentences and prediction on top of that embeddings for simple classification. I am using threaded-True in flask but still for multi-user-request (say 10) its taking almost 10 minutes for 600 sentences on 8GB windows CPU. It can be flask server issue as well as it may or may not support multi threading. Is there any thing possible from BERT end to allow parallelism to make response faster on Windows CPU?