Closed wunianqing closed 1 year ago
I am unsure the thread safety of an ORT session without looking through the docs or asking others, but I doubt this is safe. You may however be able to mitigate much of that session cost by creating distinct sessions per thread but sharing the larger tensors between them all. I forget the API function (and I'm using my phone right now), but there is one in the onnxruntime C API to do so.
Thank you very much for your response! I will look for the C API you mentioned and try it. It would be appreciated if you could post the API name here later.
I think the API is EnableMemPattern(). But according to this, DML EP does not support it.
This is what I saw before: https://onnxruntime.ai/docs/api/c/struct_ort_api.html#a0dcdc66ac26c5d9aae1ccadf09f059fc
The sample project is very useful. Just with small modifications, we can make upload, run, and download parallel.
My question is, can we infer multi-data with one session in multiple threads? This is very useful when the model is small. I already know that we can create one session for one thread. But the session itself consumes memory.