aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
444 stars 148 forks source link

Multiple models on torchserve #898

Closed brunonishimoto closed 3 months ago

brunonishimoto commented 3 months ago

Is it possible to deploy multiple models (from multiple mar files) using torchserve inside the inf2 machine?

chafik-c commented 3 months ago

I routed to the right team and working on getting you the answer.

brunonishimoto commented 3 months ago

I routed to the right team and working on getting you the answer.

Sure, thanks very much @chafik-c

jeffhataws commented 3 months ago

We should be able to deploy multiple single core models. However, note that only one neuron core can be allocated to single worker process for one model, and cannot be shared between multiple processes. The maximum number of models you can run simultaneously is limited by the number of cores. If you are using multiple worker processes per model, note that each worker process will consume a neuron core.

This document talks about how we can use torch serve. https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorial-torchserve-neuronx.html

If you want fine grained control over which neuron core to load the model this documentation should help. https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html?highlight=placement

brunonishimoto commented 3 months ago

@jeffhataws I see, thanks for your response!

jeffhataws commented 3 months ago

@brunonishimoto thanks for reaching out. Let us know what else we can help you with.