When using the seamless_communication library and specifically its translator.predict() method, I've encapsulated the core inference logic into reusable interfaces. However, I've noticed that even after wrapping it up, multiple requests are not processed concurrently; instead, they execute sequentially, significantly impacting system throughput and response time.
Result: All tasks complete sequentially rather than concurrently.
Expected Behavior
I would expect to utilize parallel processing capabilities on multi-core GPU by running multiple translations concurrently.
Actual Results
Only one request is processed at a time, leading to increased overall execution time.
Can you please suggest modifications or configurations needed in my code so that predictor.translate() can handle concurrent translations effectively?
If you need any further information or clarification, feel free to reply. Looking forward to your assistance in optimizing our application performance!
Problem Description
When using the
seamless_communication
library and specifically itstranslator.predict()
method, I've encapsulated the core inference logic into reusable interfaces. However, I've noticed that even after wrapping it up, multiple requests are not processed concurrently; instead, they execute sequentially, significantly impacting system throughput and response time.Relevant Code Snippet
Result: All tasks complete sequentially rather than concurrently.
Expected Behavior
I would expect to utilize parallel processing capabilities on multi-core GPU by running multiple translations concurrently.
Actual Results
Only one request is processed at a time, leading to increased overall execution time.
Can you please suggest modifications or configurations needed in my code so that predictor.translate() can handle concurrent translations effectively?
If you need any further information or clarification, feel free to reply. Looking forward to your assistance in optimizing our application performance!
Thanks!