aws / aws-lambda-python-runtime-interface-client

Apache License 2.0
261 stars 74 forks source link

'utf-8' codec can't decode byte error when running lambda #27

Closed matthewchung74 closed 3 years ago

matthewchung74 commented 3 years ago

I'm getting the error above and have no idea how to troubleshoot since the error messaging isn't giving any clues.

In summary, I have a Dockerfile.debug which builds from Dockerfile.debug. In the second one, I make a call do a

RUN python3.8 inference.py

inference.py

which downloads a pytorch model and also can create a requirements.txt file. I'm not sure why this would cause the cryptic error below and any help troubleshooting would be appreciated.

Steps to reproduce:

I pushed the code here: https://github.com/matthewchung74/lambda_test

  1. build docker base image using docker build -f Dockerfile.base . -t lambda_test it is already built and on this public repo public.ecr.aws/c6h1o1s4/lambda_test:base

  2. then using Dockerfile.debug docker build -f Dockerfile.debug . -t lambda_test:debug

after that is built, I run using

docker run -p 8080:8080 -p 5890:5890 lambda_test:debug

and to test, I do

curl -XPOST "http://localhost:8080/2015-03-31/functions/function/invocations" -d '{}'

and here is the error.

time="2021-03-31T22:13:18.402" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)" time="2021-03-31T22:13:50.17" level=info msg="extensionsDisabledByLayer(/opt/disable-extensions-jwigqn8j) -> stat /opt/disable-extensions-jwigqn8j: no such file or directory" time="2021-03-31T22:13:50.17" level=warning msg="Cannot list external agents" error="open /opt/extensions: no such file or directory" START RequestId: ba17141a-ad30-44cd-b48c-0d3d64edb02c Version: $LATEST [ERROR] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: inva (result, consumed) = self._buffer_decode(data, self.errors, final) time="2021-03-31T22:13:50.278" level=panic msg="ReplyStream not available" 2021/03/31 22:13:50 http: panic serving 127.0.0.1:58744: &{0xc0000b2000 map[] 2021-03-31 22:13:50.278614327 +0000 UTC m=+31.883751159 panic ReplyStream not available } goroutine 20 [running]: net/http.(conn).serve.func1(0xc00010c0a0) /usr/local/go/src/net/http/server.go:1800 +0x139 panic(0x866640, 0xc0000b2310) /usr/local/go/src/runtime/panic.go:975 +0x3e3 github.com/sirupsen/logrus.Entry.log(0xc0000b2000, 0xc000065da0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/pkg/mod/github.com/sirupsen/logrus@v1.6.0/entry.go:259 +0x335 github.com/sirupsen/logrus.(Entry).Log(0xc0000b20e0, 0xc000000000, 0xc000135588, 0x1, 0x1) /go/pkg/mod/github.com/sirupsen/logrus@v1.6.0/entry.go:287 +0xeb github.com/sirupsen/logrus.(Logger).Log(0xc0000b2000, 0xc000000000, 0xc000135588, 0x1, 0x1) /go/pkg/mod/github.com/sirupsen/logrus@v1.6.0/logger.go:193 +0x7d github.com/sirupsen/logrus.(Logger).Panic(...) /go/pkg/mod/github.com/sirupsen/logrus@v1.6.0/logger.go:234 github.com/sirupsen/logrus.Panic(...) /go/pkg/mod/github.com/sirupsen/logrus@v1.6.0/exported.go:129 go.amzn.com/lambda/rapi/rendering.RenderInteropError(0x9097c0, 0xc0000d62a0, 0xc0000fa500, 0x902b60, 0xc00004d450) /LambdaRuntimeLocal/lambda/rapi/rendering/rendering.go:292 +0x9a go.amzn.com/lambda/rapi/handler.(initErrorHandler).ServeHTTP(0xc00004d060, 0x9097c0, 0xc0000d62a0, 0xc0000fa500) /LambdaRuntimeLocal/lambda/rapi/handler/initerror.go:52 +0x519 net/http.HandlerFunc.ServeHTTP(0xc00000c720, 0x9097c0, 0xc0000d62a0, 0xc0000fa500) /usr/local/go/src/net/http/server.go:2041 +0x44 github.com/go-chi/chi.(Mux).routeHTTP(0xc00004e660, 0x9097c0, 0xc0000d62a0, 0xc0000fa500) /go/pkg/mod/github.com/go-chi/chi@v4.1.2+incompatible/mux.go:431 +0x278 net/http.HandlerFunc.ServeHTTP(0xc00004cff0, 0x9097c0, 0xc0000d62a0, 0xc0000fa500) /usr/local/go/src/net/http/server.go:2041 +0x44 go.amzn.com/lambda/rapi/middleware.RuntimeReleaseMiddleware.func1.1(0x9097c0, 0xc0000d62a0, 0xc0000fa500) /LambdaRuntimeLocal/lambda/rapi/middleware/middleware.go:100 +0xea net/http.HandlerFunc.ServeHTTP(0xc00000c520, 0x9097c0, 0xc0000d62a0, 0xc0000fa500) /usr/local/go/src/net/http/server.go:2041 +0x44 go.amzn.com/lambda/rapi/middleware.AccessLogMiddleware.func1.1(0x9097c0, 0xc0000d62a0, 0xc0000fa500) /LambdaRuntimeLocal/lambda/rapi/middleware/middleware.go:77 +0x170 net/http.HandlerFunc.ServeHTTP(0xc00000c540, 0x9097c0, 0xc0000d62a0, 0xc0000fa500) /usr/local/go/src/net/http/server.go:2041 +0x44 go.amzn.com/lambda/rapi/middleware.AppCtxMiddleware.func1.1(0x9097c0, 0xc0000d62a0, 0xc0000fa400) /LambdaRuntimeLocal/lambda/rapi/middleware/middleware.go:66 +0x77 net/http.HandlerFunc.ServeHTTP(0xc0000654a0, 0x9097c0, 0xc0000d62a0, 0xc0000fa400) /usr/local/go/src/net/http/server.go:2041 +0x44 github.com/go-chi/chi.(Mux).ServeHTTP(0xc00004e660, 0x9097c0, 0xc0000d62a0, 0xc0000fa400) /go/pkg/mod/github.com/go-chi/chi@v4.1.2+incompatible/mux.go:70 +0x513 github.com/go-chi/chi.(Mux).Mount.func1(0x9097c0, 0xc0000d62a0, 0xc0000fa400) /go/pkg/mod/github.com/go-chi/chi@v4.1.2+incompatible/mux.go:298 +0x118 net/http.HandlerFunc.ServeHTTP(0xc00000c780, 0x9097c0, 0xc0000d62a0, 0xc0000fa400) /usr/local/go/src/net/http/server.go:2041 +0x44 github.com/go-chi/chi.(Mux).routeHTTP(0xc00004e600, 0x9097c0, 0xc0000d62a0, 0xc0000fa400) /go/pkg/mod/github.com/go-chi/chi@v4.1.2+incompatible/mux.go:431 +0x278 net/http.HandlerFunc.ServeHTTP(0xc00004d080, 0x9097c0, 0xc0000d62a0, 0xc0000fa400) /usr/local/go/src/net/http/server.go:2041 +0x44 github.com/go-chi/chi.(Mux).ServeHTTP(0xc00004e600, 0x9097c0, 0xc0000d62a0, 0xc0000fa300) /go/pkg/mod/github.com/go-chi/chi@v4.1.2+incompatible/mux.go:86 +0x2b2 net/http.serverHandler.ServeHTTP(0xc0000d60e0, 0x9097c0, 0xc0000d62a0, 0xc0000fa300) /usr/local/go/src/net/http/server.go:2836 +0xa3 net/http.(conn).serve(0xc00010c0a0, 0x90a800, 0xc0000e8480) /usr/local/go/src/net/http/server.go:1924 +0x86c created by net/http.(Server).Serve /usr/local/go/src/net/http/server.go:2962 +0x35c Traceback (most recent call last): File "/var/runtime/bootstrap.py", line 449, in main add_default_site_directories() File "/var/runtime/bootstrap.py", line 394, in add_default_site_directories site.addsitedir(os.environ["LAMBDA_TASK_ROOT"]) File "/var/lang/lib/python3.8/site.py", line 208, in addsitedir addpackage(sitedir, name, known_paths) File "/var/lang/lib/python3.8/site.py", line 164, in addpackage for n, line in enumerate(f): File "/var/lang/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/var/runtime/bootstrap.py", line 481, in main() File "/var/runtime/bootstrap.py", line 458, in main lambda_runtime_client.post_init_error(to_json(error_result)) File "/var/runtime/lambda_runtime_client.py", line 42, in post_init_error response = runtime_connection.getresponse() File "/var/lang/lib/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/var/lang/lib/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/var/lang/lib/python3.8/http/client.py", line 276, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response time="2021-03-31T22:13:50.311" level=warning msg="First fatal error stored in appctx: Runtime.ExitError" time="2021-03-31T22:13:50.311" level=warning msg="Process 10(bootstrap) exited: Runtime exited with error: exit status 1" time="2021-03-31T22:13:50.311" level=error msg="Init failed" InvokeID= error="Runtime exited with error: exit status 1" time="2021-03-31T22:13:50.312" level=warning msg="Failed to send default error response: ErrInvalidInvokeID" time="2021-03-31T22:13:50.312" level=error msg="INIT DONE failed: Runtime.ExitError" time="2021-03-31T22:13:50.312" level=warning msg="Reset initiated: ReserveFail"

robmarkcole commented 3 years ago

I too am getting this

matthewchung74 commented 3 years ago

I wish I had a good suggestion but do not. I never isolated why it broke.

On Fri, May 21, 2021 at 9:24 AM Robin @.***> wrote:

I too am getting this

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/aws/aws-lambda-python-runtime-interface-client/issues/27#issuecomment-846081417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM3RRDMVMGLZPPQSMEHDKTTO2CNTANCNFSM42FVYWLA .

-- Regards, Matt

robmarkcole commented 3 years ago

after methodically rebuilding my container I discovered the error is introduced when I include in my dockerfile:

COPY mobilenet_v2-b0353104.pth ./

Does this shed any light on the underlying issue?

UPDATE: I work around this issue now by downloading the .pth file from S3 within the lambda. Not this is not simply a dockerfile error, as the error only occurs when the lambda is called. I suspect the issue is related to this being a binary file. I also tested copying in a resnet .pth file and same error, so it appears to affecting all .pth files

@matthewchung74 I suggest you could confirm this finding and update the issue accordingly?

matthewchung74 commented 3 years ago

I am starting to remember some details now,

I threw away my old code and started with this as a template

https://aws.amazon.com/blogs/machine-learning/using-container-images-to-run-pytorch-models-in-aws-lambda/ .

he uses a ./model directory for his models, but you can also put them in where you have them. you might want to do something like this

RUN chmod 644 $(find . -type f)
RUN chmod 755 $(find . -type d)

to make sure the lambda has permissions to read the file. https://docs.aws.amazon.com/lambda/latest/dg/troubleshooting-deployment.html

if you are still stuck after looking at the example above, and if you want to push your code up to github, I can take a quick eyeball.

I am working on a general-purpose solution for dockerizing lambda models for ML but it won't be ready for a couple of months.

robmarkcole commented 3 years ago

Hi @matthewchung74 I followed your suggestion to copy to a model directory and that has resolved the issue. I now have:

FROM public.ecr.aws/lambda/python:3.8

COPY requirements.txt ./
RUN python3.8 -m pip install -r requirements.txt

COPY model model
COPY app.py ./
COPY utils.py ./

CMD ["app.lambda_handler"]

and app:

import sys
import json
import torch
import numpy as np
import torchvision.models as models

from utils import (
    add_handler,
    download_image,
    init_logger,
    preprocess_image,
    model_prediction,
    number_output,
)

# Open labels
with open("model/imagenet_classes.txt") as f:
    labels = [line.strip() for line in f.readlines()]

# # Load pretrained model
PATH = "model/mobilenetv2.pth"

mobilenet_v2 = models.mobilenet_v2()
mobilenet_v2.load_state_dict(torch.load(PATH))
mobilenet_v2.eval()

def lambda_handler(event, context):
    # Retrieve inputs
    input_url, n_predictions = event["input_url"], event["n_predictions"]

    # # Download image
    input_image = download_image(input_url)

    # # Process input image
    batch = preprocess_image(input_image)

    # # Generate prediction
    pred = model_prediction(input_batch=batch, mdl=mobilenet_v2)

    # # Top n results
    n_results = number_output(mdl_output=pred, mdl_labels=labels, top_n=n_predictions)

    # prediction = model.predict(url)
    response = {"statusCode": 200, "body": json.dumps(n_results)}

    return response

My original source is https://github.com/gokavak/lambda-docker-image-pytorch-xgboost

I look forward to your general-purpose solution, and thanks for the guidance!

matthewchung74 commented 3 years ago

It is unfortunate that the error messaging is so bad. but cool, you're welcome. If you'd like, you can add your email to the waiting list, (only if you like, not hard selling) at inference.codes

rpvelloso commented 1 year ago

I am starting to remember some details now,

I threw away my old code and started with this as a template

https://aws.amazon.com/blogs/machine-learning/using-container-images-to-run-pytorch-models-in-aws-lambda/ .

he uses a ./model directory for his models, but you can also put them in where you have them. you might want to do something like this

RUN chmod 644 $(find . -type f)
RUN chmod 755 $(find . -type d)

to make sure the lambda has permissions to read the file. https://docs.aws.amazon.com/lambda/latest/dg/troubleshooting-deployment.html

if you are still stuck after looking at the example above, and if you want to push your code up to github, I can take a quick eyeball.

I am working on a general-purpose solution for dockerizing lambda models for ML but it won't be ready for a couple of months.

is this ready?

matthewchung74 commented 1 year ago

Hi Panerai,

I decided not to pursue that idea since several easy to use solutions have come out for ML deployment, such as deployment with huggingface / gradio, … which are easier than dealing with lambda.

On Fri, Oct 14, 2022 at 7:09 PM Roberto Panerai Velloso < @.***> wrote:

I am starting to remember some details now,

I threw away my old code and started with this as a template

https://aws.amazon.com/blogs/machine-learning/using-container-images-to-run-pytorch-models-in-aws-lambda/ .

he uses a ./model directory for his models, but you can also put them in where you have them. you might want to do something like this

RUN chmod 644 $(find . -type f) RUN chmod 755 $(find . -type d)

to make sure the lambda has permissions to read the file. https://docs.aws.amazon.com/lambda/latest/dg/troubleshooting-deployment.html

if you are still stuck after looking at the example above, and if you want to push your code up to github, I can take a quick eyeball.

I am working on a general-purpose solution for dockerizing lambda models for ML but it won't be ready for a couple of months.

is this ready?

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-lambda-python-runtime-interface-client/issues/27#issuecomment-1279632724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM3RRC6N7LAUEEFUQXIUZLWDIG6HANCNFSM42FVYWLA . You are receiving this because you modified the open/close state.Message ID: <aws/aws-lambda-python-runtime-interface-client/issues/27/1279632724@ github.com>

-- Regards, Matt