fnproject / fn

The container native, cloud agnostic serverless platform.
http://fnproject.io
Apache License 2.0
5.76k stars 405 forks source link

docker pause errors #836

Open skinowski opened 6 years ago

skinowski commented 6 years ago

Sample snippet from test logs:

time="2018-03-09T18:39:28Z" level=debug msg="Starting container execution" app=zoo container=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http
        time="2018-03-09T18:39:29Z" level=debug msg="Hot function launcher stats" action="server.handleFunctionCall)-fm" app=zoo currentStats="{[0 1 0 0] [0 0 1 0 0 0]}" id=01C860Z3M9A7WHG00000000000 isNeeded=false route=/http
        time="2018-03-09T18:39:29Z" level=debug msg="docker pause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http stack=Freeze

        time="2018-03-09T18:39:29Z" level=info msg="starting call" action="server.handleFunctionCall)-fm" app=zoo container_id=01C860Z3M9A7WHJ00000000000 id=01C860Z3M9A7WHG00000000000 route=/http
        time="2018-03-09T18:39:29Z" level=debug msg="docker unpause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http stack=Unfreeze

        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Container starting\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 BeginOfLogs\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received format http\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received request main.AppRequest{SleepTime:0, ResponseCode:0, ResponseContentType:\"\", JasonContentType:\"\", EchoContent:\"\", IsDebug:true, IsCrash:false, ReadFile:\"\", ReadFileSize:0, CreateFile:\"\", CreateFileSize:0, AllocateMemory:0, LeakMemory:0, TrailerRepeat:0, InvalidResponse:false, PostSleepTime:0, PostOutGarbage:\"\", PostErrGarbage:\"\"}\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received headers map[User-Agent:[Go-http-client/1.1] Content-Length:[17] Fn_call_id:[01C860Z3M9A7WHG00000000000] Fn_deadline:[2018-03-09T18:39:58.905Z] Fn_method:[GET] Fn_request_url:[http://127.0.0.1:8080/r/zoo/http]]\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received config map[FN_PATH:/http FN_TYPE:sync ENABLE_FOOTER:yes FN_FORMAT:http PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ENABLE_HEADER:yes FN_MEMORY:64 FN_APP_NAME:zoo HOME:/root HOSTNAME:default-7f0417e7-8c05-40c3-a2c4-ee686b6fb4db]\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 DockerInfo {Hostname:default-7f0417e7-8c05-40c3-a2c4-ee686b6fb4db Id:b331041cff724fd488752f4883122676f8e9a12a880bb6baa980ca3151fc159c}\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 EndOfLogs\n" app_name=zoo call_id=01C860Z3M9A7WHG00000000000 image=fnproject/fn-test-utils path=/http user_log=true

        time="2018-03-09T18:39:29Z" level=debug msg="Hot function launcher stats" action="server.handleFunctionCall)-fm" app=zoo currentStats="{[0 1 0 1] [0 0 0 1 0 0]}" id=01C860Z3M9A7WHG00000000000 isNeeded=false route=/http
        time="2018-03-09T18:39:29Z" level=debug msg="docker pause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http stack=Freeze

        time="2018-03-09T18:39:29Z" level=info msg="starting call" action="server.handleFunctionCall)-fm" app=zoo container_id=01C860Z3M9A7WHJ00000000000 id=01C860Z3YTA7WHM00000000000 route=/http
        time="2018-03-09T18:39:29Z" level=debug msg="docker unpause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http stack=Unfreeze

        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 BeginOfLogs\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received format http\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received request main.AppRequest{SleepTime:0, ResponseCode:0, ResponseContentType:\"\", JasonContentType:\"\", EchoContent:\"\", IsDebug:true, IsCrash:false, ReadFile:\"\", ReadFileSize:0, CreateFile:\"\", CreateFileSize:0, AllocateMemory:0, LeakMemory:0, TrailerRepeat:0, InvalidResponse:false, PostSleepTime:1000, PostOutGarbage:\"YOGURT_YOGURT_YOGURT\", PostErrGarbage:\"\"}\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received headers map[Fn_call_id:[01C860Z3YTA7WHM00000000000] Fn_deadline:[2018-03-09T18:39:59.242Z] Fn_method:[GET] Fn_request_url:[http://127.0.0.1:8080/r/zoo/http] User-Agent:[Go-http-client/1.1] Content-Length:[82]]\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 Received config map[HOME:/root HOSTNAME:default-7f0417e7-8c05-40c3-a2c4-ee686b6fb4db ENABLE_HEADER:yes FN_MEMORY:64 FN_APP_NAME:zoo FN_FORMAT:http PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin FN_PATH:/http FN_TYPE:sync ENABLE_FOOTER:yes]\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 DockerInfo {Hostname:default-7f0417e7-8c05-40c3-a2c4-ee686b6fb4db Id:b331041cff724fd488752f4883122676f8e9a12a880bb6baa980ca3151fc159c}\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 EndOfLogs\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true
        time="2018-03-09T18:39:29Z" level=debug msg="2018/03/09 18:39:29 PostProcess Sleeping 1000\n" app_name=zoo call_id=01C860Z3YTA7WHM00000000000 image=fnproject/fn-test-utils path=/http user_log=true

        time="2018-03-09T18:39:29Z" level=debug msg="docker pause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http stack=Freeze
        time="2018-03-09T18:39:29Z" level=error msg="error pausing container" call_id=01C860Z3M9A7WHJ00000000000 error="API error (409): {\"message\":\"Container b331041cff724fd488752f4883122676f8e9a12a880bb6baa980ca3151fc159c is already paused\"}\n"
        time="2018-03-09T18:39:29Z" level=error msg="hot function failure" app=zoo cpus= error="API error (409): {\"message\":\"Container b331041cff724fd488752f4883122676f8e9a12a880bb6baa980ca3151fc159c is already paused\"}\n" format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 route=/http

For this container, the following ops are performed:

        time="2018-03-09T18:39:29Z" level=debug msg="docker pause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 ro
        time="2018-03-09T18:39:29Z" level=debug msg="docker unpause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 
        time="2018-03-09T18:39:29Z" level=debug msg="docker pause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 ro
        time="2018-03-09T18:39:29Z" level=debug msg="docker unpause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 
        time="2018-03-09T18:39:29Z" level=debug msg="docker pause" app=zoo call_id=01C860Z3M9A7WHJ00000000000 cpus= format=http id=01C860Z3M9A7WHJ00000000000 idle_timeout=30 image=fnproject/fn-test-utils memory=64 ro

From fn perspective, things look right, but last operation fails with "already paused". This is either a conflict or a docker issue. Or something just squezeed in between pausing it before fn server. Notice that last request was just run successfully so the container might have not been paused at all.

skinowski commented 6 years ago

Second instance of this again. Looks like same tests; TestRouteRunnerIOPipes at first invocation of /r/zoo/http.

skinowski commented 6 years ago

From docker src code, seems like two ways this pause can already be set, events from containerd or dockerd pause itself.

rdallman commented 6 years ago

it doesn't seem like it would be horrid to ignore any 'already paused' errors and just leave it paused, since we are trying to pause. due to retries, 'docker start' we are already doing this with 'container already running' errors, that could be one place this comes from but i don't see any 'docker temporary error' logs to indicate that. seems like docker pause command could return before container is actually paused? :/

skinowski commented 6 years ago

I agree, but right now we do recover from this by discarding the hot container, so it's not visible to callers for the moment... Just tracking it here to make sure we eventually find out the root cause.