Closed dbuscombe-usgs closed 9 months ago
Has anybody profiled the code to see what step is the longest in the inference code? (i.e., via cPython
or the even the tensorflow profiler?)
I have not done it, but i presume its just the actual model.predict
call?
And do we have an understanding of model.predict
speed on GPU vs CPU?
In many of my applications, a single model.predict
is a small fraction of the total time. I use several other function calls to prepare inputs and outputs. I also tend to call the model several times for the same image. Writing outputs to file often takes longer than creating them. So, parallelizing the model inference makes sense anyway
According to the docs, model(x)
should be quicker than model.predict(x)
on a single input
It would be really nice to not have to rely on GPU, which will be out of reach for a lot of users and hinder cloud deployments
I added a small change to enable the option to use model in parallel using joblib using gym, zoo, etc.
https://github.com/Doodleverse/doodleverse_utils/commit/0999c404f43f461001f700390d2e3ce5d691a6ca