bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
https://bentoml.com
Apache License 2.0
7.1k stars 789 forks source link

feat: PyTorchArtifacts GPU Integration #1727

Closed aarnphm closed 3 years ago

aarnphm commented 3 years ago

Is your feature request related to a problem? Please describe.

Current practice in a ML workflow is to train model on GPUs and then convert to CPUs during inference time. Most DL frameworks have GPU support built in and make it very easy for users to implement such features, including PyTorch, Tensorflow, MXNet and also make use of different distributions strategies. However, it requires users to convert its model to correct tensor type (GPUs Tensors v. CPU tensors, read more here) in order to run on GPUs or CPUs respectively.

It would be in our best interests to provide a GPU support for each of our artifacts, where it will handle the conversion automatically so when serving with our InferenceAPI we can utilize GPU resources

Describe the solution you'd like

introduces a gpu=True options under each BentoArtifacts where the frameworks supported GPU (PyTorch, Transformers, Keras, TF, etc)

We will leave it to our Artifacts to convert the model correctly.

import bentoml
from bentoml.frameworks.pytorch import PyTorchArtifacts

@bentoml.artifacts([PyTorchArtifacts('model-a', gpu=True)])
class Service(bentoml.BentoService)

  @api()
  def predict(df: pd.DataFrame):
    return self.artifacts.model_a.predict(df)

Additional context

Feedbacks would be greatly appreciated.

aarnphm commented 3 years ago

Additional context on PyTorch API for loading GPUs

illy commented 3 years ago

@aarnphm just feel curious, enabling this GPU option will only serve on the GPU machine or it will prioritize GPU but still can run on CPU? Suppose when we pack the project to a Bento on a CPU-only machine, and use bentoml serve to test the API locally. If the GPU option enabled, can we still run this test on such a non-GPU machine? Thx

aarnphm commented 3 years ago

@aarnphm just feel curious, enabling this GPU option will only serve on the GPU machine or it will prioritize GPU but still can run on CPU?

Actually what I want to achieve is to simplify the process of packaging process with BentoML. You can definitely still run GPU model on CPU, however the model would yield wrong prediction due to having GPU tensors running on CPU. Having this options allow users to correctly setup their model to their corresponding usecase. (You don't have to use this to run on CPU).

Suppose when we pack the project to a Bento on a CPU-only machine, and use bentoml serve to test the API locally. If the GPU option enabled, can we still run this test on such a non-GPU machine?

With industry standard people usually train on GPU and then deploy on CPUs cloud services. Thus what I'm proposing is to have each Artifacts by default convert all the artifacts to CPUs based. With this options, it enables use-case where teams can take advantage of GPU inference and will allow users to correctly setup its artifacts. You can still run your model on non-GPU machine, but expect unusual behaviour.

This will actually tie with our current redesign of packaging API, which enable easier development with BentoML. This is actually just a part of our current discussion.

aarnphm commented 3 years ago

closed since not related to new API.