aws / sagemaker-pytorch-inference-toolkit

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
134 stars 70 forks source link

pass model directory as input to torchserve #118

Closed mseth10 closed 2 years ago

mseth10 commented 2 years ago

Issue #, if available: https://github.com/aws/sagemaker-pytorch-inference-toolkit/issues/117

Description of changes: Removing the _adapt_to_ts_format function and its test. It's not needed as torchserve accepts model directory as input now.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mseth10 commented 2 years ago

Following tests are now passing:

Following tests are failing:

Here's the error log from cpu test:

2022-04-06T09:00:59,852 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
--
2022-04-06T09:00:59,876 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: /opt/ml/model
2022-04-06T09:00:59,880 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive version is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2022-04-06T09:00:59,880 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive createdOn is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2022-04-06T09:00:59,882 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.
2022-04-06T09:00:59,894 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-04-06T09:00:59,975 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-04-06T09:00:59,976 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2022-04-06T09:00:59,977 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2022-04-06T09:01:00,306 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2022-04-06T09:01:00,374 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2022-04-06T09:01:00,375 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - [PID]30
2022-04-06T09:01:00,376 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Torch worker started.
2022-04-06T09:01:00,378 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Python runtime: 3.7.11
2022-04-06T09:01:00,383 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2022-04-06T09:01:00,391 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,393 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:34.40862274169922\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,394 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:16.33287811279297\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,395 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:32.2\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,395 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:6194.03125\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,397 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1015.3203125\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,398 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:17.1\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,399 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2022-04-06T09:01:00,402 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1649235660402
2022-04-06T09:01:00,449 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - model_name: model, batchSize: 1
2022-04-06T09:01:00,504 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Backend worker process died.
2022-04-06T09:01:00,504 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-04-06T09:01:00,505 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 189, in <module>
2022-04-06T09:01:00,505 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     worker.run_server()
2022-04-06T09:01:00,506 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 161, in run_server
2022-04-06T09:01:00,506 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2022-04-06T09:01:00,507 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 123, in handle_connection
2022-04-06T09:01:00,507 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2022-04-06T09:01:00,508 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 96, in load_model
2022-04-06T09:01:00,508 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     batch_size, envelope, limit_max_image_pixels)
2022-04-06T09:01:00,508 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_loader.py", line 112, in load
2022-04-06T09:01:00,507 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2022-04-06T09:01:00,509 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     initialize_fn(service.context)
2022-04-06T09:01:00,509 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_pytorch_serving_container/handler_service.py", line 51, in initialize
2022-04-06T09:01:00,509 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.
2022-04-06T09:01:00,510 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     super().initialize(context)
2022-04-06T09:01:00,510 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
2022-04-06T09:01:00,510 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self._service.validate_and_initialize(model_dir=model_dir)
2022-04-06T09:01:00,511 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_inference/transformer.py", line 157, in validate_and_initialize
2022-04-06T09:01:00,511 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self._validate_user_module_and_set_functions()
2022-04-06T09:01:00,511 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_inference/transformer.py", line 170, in _validate_user_module_and_set_functions
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     user_module = importlib.import_module(user_module_name)
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/importlib/__init__.py", line 127, in import_module
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
2022-04-06T09:01:00,513 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 983, in _find_and_load
2022-04-06T09:01:00,513 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
2022-04-06T09:01:00,513 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1.0-stderr
2022-04-06T09:01:00,513 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
2022-04-06T09:01:00,514 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 728, in exec_module
2022-04-06T09:01:00,514 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
2022-04-06T09:01:00,514 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1.0-stdout
2022-04-06T09:01:00,515 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/ml/model/code/resnet18.py", line 7, in <module>
2022-04-06T09:01:00,542 [INFO ] W-9000-model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1.0-stdout
2022-04-06T09:01:00,542 [INFO ] W-9000-model_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1.0-stderr
2022-04-06T09:01:00,543 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mseth10 commented 2 years ago

The failure with SM integration tests in test_default_inference.py using generic images turned out to be flaky, tests are passing now.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository