providing gpu_id to the custom inference script's model_fn

muhyun commented 4 years ago

Issue #, if available:

None

Description of changes:

If a model is loaded via the default model loader of MMS, workers load the model into each GPU device which are assigned to each worker by mms. However, if a custom inference script that defines model_fn is used, only model_dir is passed to model_fn. This makes it hard to determine which GPU ID the model loading function should use as context in order to utilize all the GPUs of the instance.

if the custom inference script has two arguments; model_dir and gpu_id, then inference toolkit passes model_dir and gpu_id. If it has only argument (as it is now), it passes model_dir, which does ensure the backward compatibility.

So, the necessary code change in the user side is to use the below function signature to receive gpu_id;

def model_fn(model_dir, gpu_id=None):

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc
[x] I used the commit message format described in CONTRIBUTING
[x] I have used the regional endpoint when creating S3 and/or STS clients (if appropriate)
[ ] I have updated any necessary documentation, including READMEs

Tests

[ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)
[x] I have checked that my tests are not configured for a specific region or account (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

nadiaya commented 4 years ago

ERROR:   flake8: parallel child exit code 1
ERROR:   black-check: parallel child exit code 1
ERROR:   pylint: parallel child exit code 1

muhyun commented 4 years ago

ERROR:   flake8: parallel child exit code 1
ERROR:   black-check: parallel child exit code 1
ERROR:   pylint: parallel child exit code 1

Can you help me how to reproduce these ERRORs so that I could take a look?

ajaykarpur commented 4 years ago

Hi @muhyun, please take a look at https://github.com/aws/sagemaker-inference-toolkit/blob/master/CONTRIBUTING.md. This document describes how to run the tests with tox, which will provide guidance on how to resolve the code formatting (black) and linting (flake8, pylint) violations.

muhyun commented 4 years ago

This PR is to provide gpu_id to model_fn if GPU is available and assigned to a worker. So I suggest to have two signature allowed; model_fn(model_dir) and model_fn(model_dir, gpu_id) instead of changing model_fn(model_dir) to model_fn(model_dir, gpu_id) not to break the existing user code.

But, the unit test forces to have one argument in model_fn();

        transformer = Transformer()
        model_fn = Mock()
        transformer._model_fn = model_fn
        assert transformer._initialized is False
        transformer.validate_and_initialize()
        assert transformer._initialized is True
        transformer.validate_and_initialize()
       model_fn.assert_called_once_with(environment.model_dir)

I'd like to hear your opinion on this issue and approach to provide gpu_id.

laurenyu commented 4 years ago

hi @muhyun, sorry for the lack of activity here. The model_fn, etc. functions were meant to provide backward compatibility with the old SageMaker serving interface, but unfortunately your change here would be a breaking change for anyone who has implemented their own model_fn.

You can write your own MMS handler and still use the SageMaker Inference Toolkit:

muhyun commented 4 years ago

Then, we need to explicitly document this behavior so that customers could use the multi-GPU instance for inference without wasting their money.

muhyun commented 4 years ago

How about this code change

from

            if num_args_model_fn == 2:
                self._model = self._model_fn(model_dir, gpu_id=gpu_id)
            else:
                self._model = self._model_fn(model_dir)

to

            try:
                self._model = self._model_fn(model_dir, gpu_id=gpu_id)
            except:
                self._model = self._model_fn(model_dir)

This allows a custom inference script access additional parameters. For example, if model_fn had an additional parameter in its definition like def model_fn(model_dir, gpu_id), then it can access the gpu id. If the inference script has def model_fn(model_dir), then it will still work.

aws / sagemaker-inference-toolkit