pass model directory as input to torchserve

mseth10 commented 2 years ago

Issue #, if available: https://github.com/aws/sagemaker-pytorch-inference-toolkit/issues/117

Description of changes: Removing the _adapt_to_ts_format function and its test. It's not needed as torchserve accepts model directory as input now.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 60c96f81ae2c9794e0bfeb37f2f3e9f8021b5c3e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: b96e6ca0d256e08b077250d1380bae28beb11e26
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: bf0664378ebc1c7903ea3496e1d9f4b67ffcd84a
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: bf0664378ebc1c7903ea3496e1d9f4b67ffcd84a
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: efe3c32569a7127af6b734da667e303f89b14046
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: efe3c32569a7127af6b734da667e303f89b14046
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 2d9820797d7e0100a1d4472cbc55c92a43f0e6a8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 2d9820797d7e0100a1d4472cbc55c92a43f0e6a8
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 9d0247ef2f368b36cbf16b83bedc7035c8bd6346
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 9d0247ef2f368b36cbf16b83bedc7035c8bd6346
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: bbb54f64402d9f3c6bebbf026075cdd8a9c45bd3
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: bbb54f64402d9f3c6bebbf026075cdd8a9c45bd3
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 0c625c1ae1657b01208a0f96af30818484710b4e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 0c625c1ae1657b01208a0f96af30818484710b4e
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 2be4f995e62e5cad05cf0fc429b17e228ffef23b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 2be4f995e62e5cad05cf0fc429b17e228ffef23b
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 9a9e309daccf0f75e65b04cfa479a1cb73fe16e0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 9a9e309daccf0f75e65b04cfa479a1cb73fe16e0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 381aaf49328311b57f4f42795014b4e3460363ed
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 381aaf49328311b57f4f42795014b4e3460363ed
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 33873035216c0fee376574263500fe61f695a3f7
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 33873035216c0fee376574263500fe61f695a3f7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: d81031e0e82f4d0db2f3632c92e6c6e9f68908b0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: 9fba0178b29fe48f5e136db39cf94f51ea5e7294
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mseth10 commented 2 years ago

Following tests are now passing:

flake8, twine, unit tests
local integration tests using DLC and generic images
SM integration tests using DLC images
SM integration tests in test_mnist.py using generic images

Following tests are failing:

SM integration tests in test_default_inference.py using generic images

Here's the error log from cpu test:

2022-04-06T09:00:59,852 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
--
2022-04-06T09:00:59,876 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: /opt/ml/model
2022-04-06T09:00:59,880 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive version is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2022-04-06T09:00:59,880 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive createdOn is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2022-04-06T09:00:59,882 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.
2022-04-06T09:00:59,894 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-04-06T09:00:59,975 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-04-06T09:00:59,976 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2022-04-06T09:00:59,977 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2022-04-06T09:01:00,306 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2022-04-06T09:01:00,374 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2022-04-06T09:01:00,375 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - [PID]30
2022-04-06T09:01:00,376 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Torch worker started.
2022-04-06T09:01:00,378 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Python runtime: 3.7.11
2022-04-06T09:01:00,383 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2022-04-06T09:01:00,391 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,393 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:34.40862274169922\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,394 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:16.33287811279297\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,395 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:32.2\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,395 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:6194.03125\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,397 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1015.3203125\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,398 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:17.1\|#Level:Host\|#hostname:container-0.local,timestamp:1649235660
2022-04-06T09:01:00,399 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2022-04-06T09:01:00,402 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1649235660402
2022-04-06T09:01:00,449 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - model_name: model, batchSize: 1
2022-04-06T09:01:00,504 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Backend worker process died.
2022-04-06T09:01:00,504 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-04-06T09:01:00,505 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 189, in <module>
2022-04-06T09:01:00,505 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     worker.run_server()
2022-04-06T09:01:00,506 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 161, in run_server
2022-04-06T09:01:00,506 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2022-04-06T09:01:00,507 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 123, in handle_connection
2022-04-06T09:01:00,507 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2022-04-06T09:01:00,508 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_service_worker.py", line 96, in load_model
2022-04-06T09:01:00,508 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     batch_size, envelope, limit_max_image_pixels)
2022-04-06T09:01:00,508 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/ts/model_loader.py", line 112, in load
2022-04-06T09:01:00,507 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2022-04-06T09:01:00,509 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     initialize_fn(service.context)
2022-04-06T09:01:00,509 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_pytorch_serving_container/handler_service.py", line 51, in initialize
2022-04-06T09:01:00,509 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.
2022-04-06T09:01:00,510 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     super().initialize(context)
2022-04-06T09:01:00,510 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
2022-04-06T09:01:00,510 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self._service.validate_and_initialize(model_dir=model_dir)
2022-04-06T09:01:00,511 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_inference/transformer.py", line 157, in validate_and_initialize
2022-04-06T09:01:00,511 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     self._validate_user_module_and_set_functions()
2022-04-06T09:01:00,511 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/site-packages/sagemaker_inference/transformer.py", line 170, in _validate_user_module_and_set_functions
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     user_module = importlib.import_module(user_module_name)
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.7/importlib/__init__.py", line 127, in import_module
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2022-04-06T09:01:00,512 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
2022-04-06T09:01:00,513 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 983, in _find_and_load
2022-04-06T09:01:00,513 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
2022-04-06T09:01:00,513 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1.0-stderr
2022-04-06T09:01:00,513 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
2022-04-06T09:01:00,514 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 728, in exec_module
2022-04-06T09:01:00,514 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
2022-04-06T09:01:00,514 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1.0-stdout
2022-04-06T09:01:00,515 [INFO ] W-9000-model_1.0-stdout MODEL_LOG -   File "/opt/ml/model/code/resnet18.py", line 7, in <module>
2022-04-06T09:01:00,542 [INFO ] W-9000-model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1.0-stdout
2022-04-06T09:01:00,542 [INFO ] W-9000-model_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1.0-stderr
2022-04-06T09:01:00,543 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 9fba0178b29fe48f5e136db39cf94f51ea5e7294
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: d31734df51e7e69b7ada9265f975e51bcfc6ae84
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: df9331c752b5b5bbfb49127ab5c78b734fd630ec
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mseth10 commented 2 years ago

The failure with SM integration tests in test_default_inference.py using generic images turned out to be flaky, tests are passing now.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: df9331c752b5b5bbfb49127ab5c78b734fd630ec
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

aws / sagemaker-pytorch-inference-toolkit