Closed aarnphm closed 3 years ago
Additional context on PyTorch API for loading GPUs
@aarnphm just feel curious, enabling this GPU option will only serve on the GPU machine or it will prioritize GPU but still can run on CPU? Suppose when we pack the project to a Bento on a CPU-only machine, and use bentoml serve to test the API locally. If the GPU option enabled, can we still run this test on such a non-GPU machine? Thx
@aarnphm just feel curious, enabling this GPU option will only serve on the GPU machine or it will prioritize GPU but still can run on CPU?
Actually what I want to achieve is to simplify the process of packaging process with BentoML. You can definitely still run GPU model on CPU, however the model would yield wrong prediction due to having GPU tensors running on CPU. Having this options allow users to correctly setup their model to their corresponding usecase. (You don't have to use this to run on CPU).
Suppose when we pack the project to a Bento on a CPU-only machine, and use bentoml serve to test the API locally. If the GPU option enabled, can we still run this test on such a non-GPU machine?
With industry standard people usually train on GPU and then deploy on CPUs cloud services. Thus what I'm proposing is to have each Artifacts by default convert all the artifacts to CPUs based. With this options, it enables use-case where teams can take advantage of GPU inference and will allow users to correctly setup its artifacts. You can still run your model on non-GPU machine, but expect unusual behaviour.
This will actually tie with our current redesign of packaging API, which enable easier development with BentoML. This is actually just a part of our current discussion.
closed since not related to new API.
Is your feature request related to a problem? Please describe.
Current practice in a ML workflow is to train model on GPUs and then convert to CPUs during inference time. Most DL frameworks have GPU support built in and make it very easy for users to implement such features, including PyTorch, Tensorflow, MXNet and also make use of different distributions strategies. However, it requires users to convert its model to correct tensor type (GPUs Tensors v. CPU tensors, read more here) in order to run on GPUs or CPUs respectively.
It would be in our best interests to provide a GPU support for each of our artifacts, where it will handle the conversion automatically so when serving with our InferenceAPI we can utilize GPU resources
Describe the solution you'd like
introduces a
gpu=True
options under eachBentoArtifacts
where the frameworks supported GPU (PyTorch, Transformers, Keras, TF, etc)We will leave it to our Artifacts to convert the model correctly.
Additional context
Feedbacks would be greatly appreciated.