aws / sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
385 stars 82 forks source link

Be able to change SageMaker endpoint log level #70

Open oonisim opened 4 years ago

oonisim commented 4 years ago

Describe the feature you'd like Be able to change the SageMaker endpoint cloudwatch log level.

As in the AWS support case 7309023801, currently the pre-built AWS DL container + SageMaker endpoint has no option to change the cloudwatch log level, hence creating INFO logs for every health check access. It makes difficult to see the relevant error logs.

How would this feature be used? Please describe. Be able to only see the error logs.

Describe alternatives you've considered As in the case 7309023801, create BYO container but it is overkill just to change the log level.

Additional context

CloudWatch log being cluttered with INFO with /ping health checks.


2020-09-21 11:24:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:30,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:35,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:40,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:45,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:50,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:55,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:00,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:05,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:30,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:35,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:40,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:45,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:50,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:55,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:00,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:05,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
 ```
chuyang-deng commented 4 years ago

Hello, @oonisim , I do not have access to your support case link. Are you referring to the log config with torchserve in pytorch-inference-toolkit? Or MMS log level configurations?

icywang86rui commented 4 years ago

I think the easiest way of implementing this would be allowing the customer to provide their own log4j config file through the dependencies arg here. The file should follow a naming convention, something like ./override/etc/log4j.properties. And in the container side we just us the custom override config file if it exists here

ldong87 commented 3 years ago

I have the same problem. The cloudwatch log generated by sagemaker endpoint have too much redundant info. For example, the timestamp are repetitive and the com.amazonaws.ml.mms.wlm.WorkderLifeCycle doesn't mean anything to me. I wonder how to change the logging format to suppress the redundant info?

2021-01-11T14:33:57.539-06:00 | 2021-01-11 20:33:56,799 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - pytorch version: 1.5.1  

beatrizdemiguelperez commented 1 year ago

Same here, any help?

j-adamczyk commented 1 year ago

Any news on this? This is very problematic for anything using PySpark (both training and inference), which outputs a lot of logs, and 99% are totally useless

Ce11an commented 10 months ago

Bump 🙏🏻

is-abhi commented 6 months ago

Any update on this?

rromanchuk commented 6 months ago

imo, i shouldn't need to apply any custom modifications or env flags to at least get the error message and ideally stacktrace to cloudwatch when the container returns http status 500.

coder-pikachu commented 5 months ago

same error. Any help?

harikagaggara commented 4 months ago

same issue, can someone please provide any update on the log level configuration?

gmaiwald commented 4 months ago

same issue here. would be great to have a solution on it. thanks

gmaiwald commented 4 months ago

Have a look at: https://github.com/awslabs/multi-model-server/blob/master/docs/configuration.md This helped us to customize logging.